My recommendation would be to for sure use battery backed up raid controllers
for spinning disks -just a RAID 0 per disk- and use IT mode controllers for
SSDs. We did plenty of testing and spinning disks REALLY benefit from raid
controller write cache -in write back mode-, but SSDs are faster
Hi,
our dell servers contain "PERC H730P Mini" raid controllers with 2GB battery
backed cache memory.
All of our ceph osd disks (typically 12 * 8GB spinners or 16 * 1-2 TBssds per
node) are used directly without using the raid functionality.
We deactivated the cache of the controller for the
Me and quite a few others have had high random latency issues with disk
cache enabled.
,Ash
On Tue, 20 Nov 2018 at 9:09 PM, Alex Litvak
wrote:
> John,
>
> If I go with write through, shouldn't disk cache be enabled?
>
> On 11/20/2018 6:12 AM, John Petrini wrote:
> > I would disable cache on
John,
If I go with write through, shouldn't disk cache be enabled?
On 11/20/2018 6:12 AM, John Petrini wrote:
I would disable cache on the controller for your journals. Use write through
and no read ahead. Did you make sure the disk cache is disabled?
On Tuesday, November 20, 2018, Alex
I would disable cache on the controller for your journals. Use write
through and no read ahead. Did you make sure the disk cache is disabled?
On Tuesday, November 20, 2018, Alex Litvak
wrote:
> I went through raid controller firmware update. I replaced a pair of
SSDs with new ones. Nothing
I went through raid controller firmware update. I replaced a pair of SSDs with new ones. Nothing have changed. Per controller card utility it shows that no patrol reading happens and battery
backup is in a good shape. Cache policy is WriteBack. I am aware on the bad battery effect but it
Hi,
> Raid card for journal disks is Perc H730 (Megaraid), RAID 1, battery back
> cache is on
>
> Default Cache Policy: WriteBack, ReadAdaptive, Direct, No Write Cache if Bad
> BBU
> Current Cache Policy: WriteBack, ReadAdaptive, Direct, No Write Cache if Bad
> BBU
>
> I have 2 other nodes
Ah yes sorry be because your behind a raid card.
Your need to check the raid config I know on a HP card for example you have
an option called enabled disk cache.
This is separate to enabling the raid card cache, the config should be per
a drive (is on HP) so worth checking the config outputs for
Hmm,
On all nodes
hdparm -W /dev/sdb
/dev/sdb:
SG_IO: bad/missing sense data, sb[]: 70 00 05 00 00 00 00 0d 00 00 00 00 20 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
write-caching = not supported
On 11/18/2018 10:30 AM, Ashley Merrick wrote:
hdparm -W /dev/xxx should show
hdparm -W /dev/xxx should show you
On Mon, 19 Nov 2018 at 12:28 AM, Alex Litvak
wrote:
> All machines state the same.
>
> /opt/MegaRAID/MegaCli/MegaCli64 -LDGetProp -DskCache -Lall -a0
>
> Adapter 0-VD 0(target id: 0): Disk Write Cache : Disk's Default
> Adapter 0-VD 1(target id: 1): Disk Write
All machines state the same.
/opt/MegaRAID/MegaCli/MegaCli64 -LDGetProp -DskCache -Lall -a0
Adapter 0-VD 0(target id: 0): Disk Write Cache : Disk's Default
Adapter 0-VD 1(target id: 1): Disk Write Cache : Disk's Default
I assume they are all on which is actually bad based on common sense.
I am not saying controller cache, you should check ssd disk caches.
On Sun, Nov 18, 2018 at 11:40 AM Alex Litvak
wrote:
>
> All 3 nodes have this status for SSD mirror. Controller cache is on for all
> 3.
>
> Default Cache Policy: WriteBack, ReadAdaptive, Direct, No Write Cache if Bad
> BBU
>
All 3 nodes have this status for SSD mirror. Controller cache is on for all 3.
Default Cache Policy: WriteBack, ReadAdaptive, Direct, No Write Cache if Bad BBU
Current Cache Policy: WriteBack, ReadAdaptive, Direct, No Write Cache if Bad BBU
On 11/18/2018 12:45 AM, Serkan Çoban wrote:
Does
Does write cache on SSDs enabled on three servers? Can you check them?
On Sun, Nov 18, 2018 at 9:05 AM Alex Litvak
wrote:
>
> Raid card for journal disks is Perc H730 (Megaraid), RAID 1, battery back
> cache is on
>
> Default Cache Policy: WriteBack, ReadAdaptive, Direct, No Write Cache if Bad
Raid card for journal disks is Perc H730 (Megaraid), RAID 1, battery back cache
is on
Default Cache Policy: WriteBack, ReadAdaptive, Direct, No Write Cache if Bad BBU
Current Cache Policy: WriteBack, ReadAdaptive, Direct, No Write Cache if Bad BBU
I have 2 other nodes with older Perc H710 and
>10ms w_await for SSD is too much. How that SSD is connected to the system? Any
>raid card installed on this system? What is the raid mode?
On Sun, Nov 18, 2018 at 8:25 AM Alex Litvak
wrote:
>
> Here is another snapshot. I wonder if this write io wait is too big
> Device: rrqm/s
Here is another snapshot. I wonder if this write io wait is too big
Device: rrqm/s wrqm/s r/s w/srkB/swkB/s avgrq-sz
avgqu-sz await r_await w_await svctm %util
dm-14 0.00 0.000.00 23.00 0.00 336.0029.22
0.34 14.740.00
I stand corrected, I looked at the device iostat, but it was partitioned. Here
is a more correct picture of what is going on now.
Device: rrqm/s wrqm/s r/s w/srkB/swkB/s avgrq-sz
avgqu-sz await r_await w_await svctm %util
dm-14 0.00 0.000.00
The iostat isn't very helpful because there are not many writes. I'd
recommend disabling cstates entirely, not sure it's your problem but it's
good practice and if your cluster goes as idle as your iostat suggests it
could be the culprit.
___
ceph-users
Plot thickens:
I checked c-states and apparently I am operating in c1 with all CPUS on.
Apparently servers were tuned to use latency-performance
tuned-adm active
Current active profile: latency-performance
turbostat shows
PackageCore CPU Avg_MHz %Busy Bzy_MHz TSC_MHz SMI
You can check if cstates are enabled with cat /proc/acpi/processor/info.
Look for power management: yes/no.
If they are enabled then you can check the current cstate of each core. 0
is the CPU's normal operating range, any other state means the processor is
in a power saving mode. cat
John,
Thank you for suggestions:
I looked into journal SSDs. It is close to 3 years old showing 5.17% of
wear (352941GB Written to disk with 3.6 PB endurance specs over 5 years)
It could be that smart not telling all but that it what I see.
Vendor Specific SMART Attributes with Thresholds:
I'd take a look at cstates if it's only happening during periods of
low activity. If your journals are on SSD you should also check their
health. They may have exceeded their write endurance - high apply
latency is a tell tale sign of this and you'd see high iowait on those
disks.
I am evaluating bluestore on the separate cluster. Unfortunately
upgrading this one is out of the question at the moment for multiple
reasons. That is why I am trying to find a possible root cause.
On 11/17/2018 2:14 PM, Paul Emmerich wrote:
Are you running FileStore? (The config options
Are you running FileStore? (The config options you are using looks
like a FileStore config)
Try out BlueStore, we've found that it reduces random latency spikes
due to filesystem weirdness a lot.
Paul
--
Paul Emmerich
Looking for help with your Ceph cluster? Contact us at https://croit.io
I am using libvirt for block device (openstack, proxmox KVM VMs)
Also I am mounting cephfs inside of VMs and on bare metal hosts. In
this case it would be a kernel based client.
From what I can see based on pool stats cephfs pools have higher
utilization comparing to block pools during the
Hi Alex,
What kind of clients do you use? Is it KVM (QEMU) using NBD driver,
kernel, or...?
Regards,
Kees
On 17-11-18 20:17, Alex Litvak wrote:
> Hello everyone,
>
> I am trying to troubleshoot cluster exhibiting huge spikes of latency.
> I cannot quite catch it because it happens during the
Hello everyone,
I am trying to troubleshoot cluster exhibiting huge spikes of latency.
I cannot quite catch it because it happens during the light activity and
randomly affects one osd node out of 3 in the pool.
This is a file store.
I see some osds exhibit applied latency of 400 ms, 1
28 matches
Mail list logo