Re: [ceph-users] slow request and unresponsive kvm guests after upgrading ceph cluster and os, please help debugging

2020-01-07 Thread Stefan Kooman
Quoting Paul Emmerich (paul.emmer...@croit.io):
> We've also seen some problems with FileStore on newer kernels; 4.9 is the
> last kernel that worked reliably with FileStore in my experience.
> 
> But I haven't seen problems with BlueStore related to the kernel version
> (well, except for that scrub bug, but my work-around for that is in all
> release versions).

What scrub bug are you talking about?

Gr. Stefan


-- 
| BIT BV  https://www.bit.nl/Kamer van Koophandel 09090351
| GPG: 0xD14839C6   +31 318 648 688 / i...@bit.nl
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] slow request and unresponsive kvm guests after upgrading ceph cluster and os, please help debugging

2020-01-07 Thread Stefan Kooman
Quoting Jelle de Jong (jelledej...@powercraft.nl):

> question 2: what systemd target i can use to run a service after all
> ceph-osds are loaded? I tried ceph.target ceph-osd.target both do not work
> reliable.

ceph-osd.target works for us (every time). Have you enabled all the
individual OSD services, i.e. ceph-osd@0.service?

> question 3: should I still try to upgrade to bluestore or pray to the system
> ods that my performance is back after many many hours of troubleshooting?

I would suggest the first, second is optional ;-). Especially because
you have seperate NVMe device you can use for WAL / DB. It has
advantages over filestore ...

> I made a few changes I am going to just list them for other people that are
> suffering from slow performance after upgrading there Ceph and/or OS.
> 
> Disk utilization is back around 10% no more 80-100%... and rados bench is
> stable again.
> 
> apt-get install irqbalance nftables

^^ Are these some of these changes? Do you need those packages in order
to unload / blacklist them?

I don't get what your fixes are, or what the problem was. Firewall
issues?

What Ceph version did you upgrade to?

Gr. Stefan


-- 
| BIT BV  https://www.bit.nl/Kamer van Koophandel 09090351
| GPG: 0xD14839C6   +31 318 648 688 / i...@bit.nl
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] slow request and unresponsive kvm guests after upgrading ceph cluster and os, please help debugging

2020-01-07 Thread Jelle de Jong

Hello everybody,

I think I fixed the issues after weeks of looking.

question 1: anyone know hos to prevent iptables, nftables or conntrack 
to be loaded in the first time? Adding them to 
/etc/modprobe.d/blacklist.local.conf does not seem to work? What is 
recommended?


question 2: what systemd target i can use to run a service after all 
ceph-osds are loaded? I tried ceph.target ceph-osd.target both do not 
work reliable.


question 3: should I still try to upgrade to bluestore or pray to the 
system ods that my performance is back after many many hours of 
troubleshooting?


I made a few changes I am going to just list them for other people that 
are suffering from slow performance after upgrading there Ceph and/or OS.


Disk utilization is back around 10% no more 80-100%... and rados bench 
is stable again.


apt-get install irqbalance nftables

# cat /etc/ceph/ceph.conf
[global]
fsid = 5f8d3724-1a51-4895-9b3e-5eb90ea49782
mon_initial_members = ceph01, ceph02, ceph03
mon_host = 192.168.35.11,192.168.35.12,192.168.35.13
auth_cluster_required = cephx
auth_service_required = cephx
auth_client_required = cephx
filestore_xattr_use_omap = true

osd pool default size = 3
public network = 192.168.35.0/28
cluster network = 192.168.35.0/28
osd pool default min size = 2

osd scrub begin hour = 23
osd scrub end hour = 6

# default osd recovery max active = 3
osd recovery max active = 1

#setuser match path = /var/lib/ceph/$type/$cluster-$id

debug_default = 0
debug_lockdep = 0/0
debug_context = 0/0
debug_crush = 0/0
debug_buffer = 0/0
debug_timer = 0/0
debug_filer = 0/0
debug_objecter = 0/0
debug_rados = 0/0
debug_rbd = 0/0
debug_journaler = 0/0
debug_objectcatcher = 0/0
debug_client = 0/0
debug_osd = 0/0
debug_optracker = 0/0
debug_objclass = 0/0
debug_filestore = 0/0
debug_journal = 0/0
debug_ms = 0/0
debug_monc = 0/0
debug_tp = 0/0
filestore_op_threads = 8
filestore_max_inline_xattr_size = 254
filestore_max_inline_xattrs = 6
filestore_queue_max_ops = 500
filestore_queue_committing_max_ops = 5000
filestore_merge_threshold = 40
filestore_split_multiple = 10
journal_max_write_entries = 1000
journal_queue_max_ops = 3000
journal_max_write_bytes = 1048576000
osd_mkfs_options_xfs = -f -I size=2048
osd_mount_options_xfs = noatime,largeio,nobarrier,inode64,allocsize=8M
ods_op_threads = 32
osd_journal_size = 1
filestore_queue_max_bytes = 1048576000
filestore_queue_committing_max_bytes = 1048576000
journal_queue_max_bytes = 1048576000
filestore_max_sync_interval = 10
filestore_journal_parallel = true

[client]
rbd cache = true
#rbd cache max dirty = 0

# cat /etc/sysctl.d/30-nic-10gbit.conf
net.ipv4.tcp_rmem = 1000 1000 1000
net.ipv4.tcp_wmem = 1000 1000 1000
net.ipv4.tcp_mem = 1000 1000 1000
net.core.rmem_default = 524287
net.core.wmem_default = 524287
net.core.rmem_max = 524287
net.core.wmem_max = 524287
net.core.netdev_max_backlog = 30

Unload all forms of filtering, does not blacklist does not work, they 
keep getting loaded! Guess auto loaded by kernel.


echo "blacklist ip_tables" | tee --append 
/etc/modprobe.d/blacklist.local.conf
echo "blacklist iptable_filter" | tee --append 
/etc/modprobe.d/blacklist.local.conf
echo "blacklist ip6_tables" | tee --append 
/etc/modprobe.d/blacklist.local.conf
echo "blacklist ip6table_filter" | tee --append 
/etc/modprobe.d/blacklist.local.conf
echo "blacklist nf_tables" | tee --append 
/etc/modprobe.d/blacklist.local.conf
echo "blacklist nf6_tables" | tee --append 
/etc/modprobe.d/blacklist.local.conf

depmod -a
update-initramfs -u -k all -v

root@ceph02:~# cat /etc/rc.local
#!/bin/bash -e
#
# rc.local
#
# This script is executed at the end of each multiuser runlevel.
# Make sure that the script will "exit 0" on success or any other
# value on error.
#
# In order to enable or disable this script just change the execution
# bits.
#
# By default this script does nothing.

for i in {a..e}; doecho 512 > /sys/block/sd$i/queue/read_ahead_kb; done
for i in {a..d}; dohdparm -q -B 255 -q -W0 /dev/sd$i; done

echo 'on' > '/sys/bus/pci/devices/:00:01.0/power/control'
echo 'on' > '/sys/bus/pci/devices/:00:03.0/power/control'
echo 'on' > '/sys/bus/pci/devices/:00:01.0/power/control'

cpupower frequency-set --governor performance

modprobe -r iptable_filter ip_tables ip6table_filter ip6_tables 
nf_tables_ipv6 nf_tables_ipv4 nf_tables_bridge nf_tables


array=($(pidof ceph-osd))
taskset -cp 0-5 $(echo ${array[0]})
taskset -cp 12-17 $(echo ${array[1]})
taskset -cp 6-11 $(echo ${array[2]})
taskset -cp 18-23 $(echo ${array[3]})

exit 0


Please also save the pastebin from my OP there is a lot of benchmark and 
test notes in there.


root@ceph02:~# rados bench -p scbench 10 write --no-cleanup
hints = 1
Maintaining 16 concurrent writes of 4194304 bytes to objects of size 
4194304 for up to 10 seconds or 0 objects

Object prefix: benchmark_data_ceph02_396172
  sec Cur ops   started  finished  avg MB/s  cur MB/s last lat(s)  avg 

Re: [ceph-users] slow request and unresponsive kvm guests after upgrading ceph cluster and os, please help debugging

2020-01-06 Thread Paul Emmerich
We've also seen some problems with FileStore on newer kernels; 4.9 is the
last kernel that worked reliably with FileStore in my experience.

But I haven't seen problems with BlueStore related to the kernel version
(well, except for that scrub bug, but my work-around for that is in all
release versions).

Paul

-- 
Paul Emmerich

Looking for help with your Ceph cluster? Contact us at https://croit.io

croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90


On Mon, Jan 6, 2020 at 8:44 PM Jelle de Jong 
wrote:

> Hello everybody,
>
> I have issues with very slow requests a simple tree node cluster here,
> four WDC enterprise disks and Intel Optane NVMe journal on identical
> high memory nodes, with 10GB networking.
>
> It was working all good with Ceph Hammer on Debian Wheezy, but I wanted
> to upgrade to a supported version and test out bluestore as well. So I
> upgraded to luminous on Debian Stretch and used ceph-volume to create
> bluestore osds, everything went downhill from there.
>
> I went back to filestore on all nodes but I still have slow requests and
> I can not pinpoint a good reason I tried to debug and gathered
> information to look at:
>
> https://paste.debian.net/hidden/acc5d204/
>
> First I thought it was the balancing that was making things slow, then I
> thought it might be the LVM layer, so I recreated the nodes without LVM
> by switching from ceph-volume to ceph-disk, no different still slow
> request. Then I changed back from bluestore to filestore but still the a
> very slow cluster. Then I thought it was a CPU scheduling issue and
> downgraded the 5.x kernel and CPU performance is full speed again. I
> thought maybe there is something weird with an osd and taking them out
> one by one, but slow request are still showing up and client performance
> from vms is really poor.
>
> I just feel a burst of small requests keeps blocking for a while then
> recovers again.
>
> Many thanks for helping out looking at the URL.
>
> If there are options which I should tune for a hdd with nvme journal
> setup please share.
>
> Jelle
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com