Re: Please help! ZooKeeper 3.7.2 fsync-ing latency issue

2024-04-18 Thread Patrick Hunt
On Thu, Apr 18, 2024 at 9:15 AM Patrick Hunt  wrote:

> My experience with slow fsyncs is that it's almost always due to
> contention for disk IO. I see that you tuned the snap* sizes down, which is
> reasonable. You might check what ZK activity is happening during this
> period? Perhaps some client is hammering the cluster, have you ruled
> that out?
>
>
Actually one other thing (sorry - it's been a while since I have seen this)
could be GC activity. If something (eg my point about client activity due
to some periodic event...) causes lots of memory pressure, perhaps the GC
is somehow impacting the fsync (or the activity around the fsync). Have you
tried running with GC tracking and see if that's related to the event?

Patrick


> I searched the mail archives, there are other folks reporting this issue,
> you might take a look. I found this one in particular that you might
> checkout:
> https://lists.apache.org/thread/qjrlprmt7pdy63ztvjtvkd0f5zgw5dgk
>
> Patrick
>
> On Thu, Apr 18, 2024 at 3:31 AM Xu Bill  wrote:
>
>> Hello,
>>
>> I have a pretty weird issue of ZooKeeper.
>> Everyday around 17:30, my ZooKeeper throws a warning message in log says
>> "fsync-ing the write ahead log in SyncThread:0 took 36919ms which will
>> adversely effect operation latency.File size is 16777232 bytes.". And this
>> causes my clients connected to ZooKeeper being timed out. I have to restart
>> my clients every day.
>>
>> Though I don't think the size of the txn log file is too big to be
>> handled quickly,
>> still I tried to change parameters to supress the size of txn log. Below
>> is my configuration.
>> preAllocSize=16M
>> snapCount=3
>> snapSizeLimitInKb=32M
>>
>> Even with this configuration, I still got the warnings.
>>
>> I also tried to monitor the IO stats on data disk which the data dir of
>> ZooKeeper is in.
>> But the stats were as the same as usual.
>>
>> Can anybody help give suggestions on how to solve or investigate on this
>> issue?
>> I am using ZooKeeper 3.7.2.
>> The IO stats were tps=122, reading=20.1k/s, writing=2M/s, when the
>> warning was happening.
>>
>> Best regards,
>> Bill
>>
>


Re: Please help! ZooKeeper 3.7.2 fsync-ing latency issue

2024-04-18 Thread Patrick Hunt
My experience with slow fsyncs is that it's almost always due to contention
for disk IO. I see that you tuned the snap* sizes down, which is
reasonable. You might check what ZK activity is happening during this
period? Perhaps some client is hammering the cluster, have you ruled
that out?

I searched the mail archives, there are other folks reporting this issue,
you might take a look. I found this one in particular that you might
checkout:
https://lists.apache.org/thread/qjrlprmt7pdy63ztvjtvkd0f5zgw5dgk

Patrick

On Thu, Apr 18, 2024 at 3:31 AM Xu Bill  wrote:

> Hello,
>
> I have a pretty weird issue of ZooKeeper.
> Everyday around 17:30, my ZooKeeper throws a warning message in log says
> "fsync-ing the write ahead log in SyncThread:0 took 36919ms which will
> adversely effect operation latency.File size is 16777232 bytes.". And this
> causes my clients connected to ZooKeeper being timed out. I have to restart
> my clients every day.
>
> Though I don't think the size of the txn log file is too big to be handled
> quickly,
> still I tried to change parameters to supress the size of txn log. Below
> is my configuration.
> preAllocSize=16M
> snapCount=3
> snapSizeLimitInKb=32M
>
> Even with this configuration, I still got the warnings.
>
> I also tried to monitor the IO stats on data disk which the data dir of
> ZooKeeper is in.
> But the stats were as the same as usual.
>
> Can anybody help give suggestions on how to solve or investigate on this
> issue?
> I am using ZooKeeper 3.7.2.
> The IO stats were tps=122, reading=20.1k/s, writing=2M/s, when the warning
> was happening.
>
> Best regards,
> Bill
>


Please help! ZooKeeper 3.7.2 fsync-ing latency issue

2024-04-18 Thread Xu Bill
Hello,

I have a pretty weird issue of ZooKeeper.
Everyday around 17:30, my ZooKeeper throws a warning message in log says 
"fsync-ing the write ahead log in SyncThread:0 took 36919ms which will 
adversely effect operation latency.File size is 16777232 bytes.". And this 
causes my clients connected to ZooKeeper being timed out. I have to restart my 
clients every day.

Though I don't think the size of the txn log file is too big to be handled 
quickly,
still I tried to change parameters to supress the size of txn log. Below is my 
configuration.
preAllocSize=16M
snapCount=3
snapSizeLimitInKb=32M

Even with this configuration, I still got the warnings.

I also tried to monitor the IO stats on data disk which the data dir of 
ZooKeeper is in.
But the stats were as the same as usual.

Can anybody help give suggestions on how to solve or investigate on this issue?
I am using ZooKeeper 3.7.2.
The IO stats were tps=122, reading=20.1k/s, writing=2M/s, when the warning was 
happening.

Best regards,
Bill