Re: [PVE-User] High I/O waits, not sure if it's a ceph issue.

2020-07-03 Thread Mark Schouten
On Thu, Jul 02, 2020 at 03:57:32PM +0200, Alexandre DERUMIER wrote:
> Hi,
> 
> you should give it a try to ceph octopus. librbd have greatly improved for 
> write, and I can recommand to enable writeback now by default

Yes, that wil probably even work better. But what I'm trying to
determine here, is if krbd is a better choice than librbd :)

-- 
Mark Schouten | Tuxis B.V.
KvK: 74698818 | http://www.tuxis.nl/
T: +31 318 200208 | i...@tuxis.nl
___
pve-user mailing list
pve-user@pve.proxmox.com
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user


Re: [PVE-User] High I/O waits, not sure if it's a ceph issue.

2020-07-02 Thread Alexandre DERUMIER
Hi,

you should give it a try to ceph octopus. librbd have greatly improved for 
write, and I can recommand to enable writeback now by default


Here some iops result with 1vm - 1disk -  4k block   iodepth=64, librbd, no 
iothread.


nautilus-cache=none nautilus-cache=writeback
  octopus-cache=none octopus-cache=writeback
  
randread 4k  62.1k 25.2k
61.1k 60.8k
randwrite 4k 27.7k 19.5k
34.5k 53.0k
seqwrite 4k  7850  37.5k
24.9k 82.6k




- Mail original -
De: "Mark Schouten" 
À: "proxmoxve" 
Envoyé: Jeudi 2 Juillet 2020 15:15:20
Objet: Re: [PVE-User] High I/O waits, not sure if it's a ceph issue.

On Thu, Jul 02, 2020 at 09:06:54AM +1000, Lindsay Mathieson wrote: 
> I did some adhoc testing last night - definitely a difference, in KRBD's 
> favour. Both sequential and random IO was much better with it enabled. 

Interesting! I just did some testing too on our demo cluster. Ceph with 
6 osd's over three nodes, size 2. 

root@node04:~# pveversion 
pve-manager/6.2-6/ee1d7754 (running kernel: 5.4.41-1-pve) 
root@node04:~# ceph -v 
ceph version 14.2.9 (bed944f8c45b9c98485e99b70e11bbcec6f6659a) nautilus 
(stable) 

rbd create fio_test --size 10G -p Ceph 
rbd create map_test --size 10G -p Ceph 
rbd map Ceph/map_test 

When just using a write test (rw=randwrite) krbd wins, big. 
rbd: 
WRITE: bw=37.9MiB/s (39.8MB/s), 37.9MiB/s-37.9MiB/s (39.8MB/s-39.8MB/s), 
io=10.0GiB (10.7GB), run=269904-269904msec 
krbd: 
WRITE: bw=207MiB/s (217MB/s), 207MiB/s-207MiB/s (217MB/s-217MB/s), io=10.0GiB 
(10.7GB), run=49582-49582msec 

However, using rw=randrw (rwmixread=75), things change a lot: 
rbd: 
READ: bw=49.0MiB/s (52.4MB/s), 49.0MiB/s-49.0MiB/s (52.4MB/s-52.4MB/s), 
io=7678MiB (8051MB), run=153607-153607msec 
WRITE: bw=16.7MiB/s (17.5MB/s), 16.7MiB/s-16.7MiB/s (17.5MB/s-17.5MB/s), 
io=2562MiB (2687MB), run=153607-153607msec 

krbd: 
READ: bw=5511KiB/s (5643kB/s), 5511KiB/s-5511KiB/s (5643kB/s-5643kB/s), 
io=7680MiB (8053MB), run=1426930-1426930msec 
WRITE: bw=1837KiB/s (1881kB/s), 1837KiB/s-1837KiB/s (1881kB/s-1881kB/s), 
io=2560MiB (2685MB), run=1426930-1426930msec 



Maybe I'm interpreting or testing stuff wrong, but it looks like simply writing 
to krbd is much faster, but actually trying to use that data seems slower. Let 
me know what you guys think. 


Attachments are being stripped, IIRC, so here's the config and the full output 
of the tests: 

RESULTS== 
rbd_write: (g=0): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 
4096B-4096B, ioengine=rbd, iodepth=32 
rbd_readwrite: (g=1): rw=randrw, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 
4096B-4096B, ioengine=rbd, iodepth=32 
krbd_write: (g=2): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 
4096B-4096B, ioengine=libaio, iodepth=32 
krbd_readwrite: (g=3): rw=randrw, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 
4096B-4096B, ioengine=libaio, iodepth=32 
fio-3.12 
Starting 4 processes 
Jobs: 1 (f=1): [_(3),f(1)][100.0%][eta 00m:00s] 
rbd_write: (groupid=0, jobs=1): err= 0: pid=1846441: Thu Jul 2 15:08:42 2020 
write: IOPS=9712, BW=37.9MiB/s (39.8MB/s)(10.0GiB/269904msec); 0 zone resets 
slat (nsec): min=943, max=1131.9k, avg=6367.94, stdev=10934.84 
clat (usec): min=1045, max=259066, avg=3286.70, stdev=4553.24 
lat (usec): min=1053, max=259069, avg=3293.06, stdev=4553.20 
clat percentiles (usec): 
| 1.00th=[ 1844], 5.00th=[ 2114], 10.00th=[ 2311], 20.00th=[ 2573], 
| 30.00th=[ 2769], 40.00th=[ 2933], 50.00th=[ 3064], 60.00th=[ 3228], 
| 70.00th=[ 3425], 80.00th=[ 3621], 90.00th=[ 3982], 95.00th=[ 4359], 
| 99.00th=[ 5538], 99.50th=[ 6718], 99.90th=[ 82314], 99.95th=[125305], 
| 99.99th=[187696] 
bw ( KiB/s): min=17413, max=40282, per=83.81%, avg=32561.17, stdev=3777.39, 
samples=539 
iops : min= 4353, max=10070, avg=8139.93, stdev=944.34, samples=539 
lat (msec) : 2=2.64%, 4=87.80%, 10=9.37%, 20=0.08%, 50=0.01% 
lat (msec) : 100=0.02%, 250=0.09%, 500=0.01% 
cpu : usr=8.73%, sys=5.27%, ctx=1254152, majf=0, minf=8484 
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=100.0%, >=64=0.0% 
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% 
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.1%, 64=0.0%, >=64=0.0% 
issued rwts: total=0,2621440,0,0 short=0,0,0,0 dropped=0,0,0,0 
latency : target=0, window=0, percentile=100.00%, depth=32 
rbd_readwrite: (groupid=1, jobs=1): err= 0: pid=1852029: Thu Jul 2 15:08:42 
2020 
read: IOPS=12.8k, BW=49.0MiB/s (52.4MB/s)(7678MiB/153607msec) 
slat (nsec): min=315, max=4467.8k, avg=3247.91, stdev=7360.28 
clat (usec): min=276, max=160495, avg=1412.53, stdev=656.11 
lat (usec): min=281, max=160497, avg=1415.78, stdev=656.02 
clat percentiles (usec): 
| 1.00th=[ 494], 5.00th=[ 

Re: [PVE-User] High I/O waits, not sure if it's a ceph issue.

2020-07-01 Thread Lindsay Mathieson

On 30/06/2020 11:09 pm, Mark Schouten wrote:

I think this is incorrect. Using KRBD uses the kernel-driver which is
usually older than the userland-version. Also, upgrading is easier when
not using KRBD.

I'd like to hear that I'm wrong, am I?:)




I did some adhoc testing last night - definitely a difference, in KRBD's 
favour. Both sequential and random IO was much better with it enabled.


There's some interesting theads online regards librbd vs KRBD. 
Apparently since 12.x, librd should perform better, but a lot of people 
aren't seeing it.


--
Lindsay

___
pve-user mailing list
pve-user@pve.proxmox.com
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user


Re: [PVE-User] High I/O waits, not sure if it's a ceph issue.

2020-07-01 Thread Lindsay Mathieson

On 2/07/2020 2:54 am, jameslipski via pve-user wrote:

Thank you, there is definitely an improvement to using krbd -- not seeing any 
i/o waits.


Excellent, glad to hear it.

--
Lindsay

___
pve-user mailing list
pve-user@pve.proxmox.com
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user


Re: [PVE-User] High I/O waits, not sure if it's a ceph issue.

2020-07-01 Thread jameslipski via pve-user
--- Begin Message ---
Thank you, there is definitely an improvement to using krbd -- not seeing any 
i/o waits.



‐‐‐ Original Message ‐‐‐
On Tuesday, June 30, 2020 7:47 PM, Lindsay Mathieson 
 wrote:

> On 30/06/2020 11:09 pm, Mark Schouten wrote:
>
> > I think this is incorrect. Using KRBD uses the kernel-driver which is
> > usually older than the userland-version. Also, upgrading is easier when
> > not using KRBD.
>
> Older yes - Luminous (12.x)
>
> But it supports sufficient features and I found it considerably faster
> than the user driver.
>
> 
>
> Lindsay
>
> pve-user mailing list
> pve-user@pve.proxmox.com
> https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user


--- End Message ---
___
pve-user mailing list
pve-user@pve.proxmox.com
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user


Re: [PVE-User] High I/O waits, not sure if it's a ceph issue.

2020-06-30 Thread Lindsay Mathieson

On 30/06/2020 11:09 pm, Mark Schouten wrote:

I think this is incorrect. Using KRBD uses the kernel-driver which is
usually older than the userland-version. Also, upgrading is easier when
not using KRBD.



Older yes - Luminous (12.x)

But it supports sufficient features and I found it considerably faster 
than the user driver.


--
Lindsay

___
pve-user mailing list
pve-user@pve.proxmox.com
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user


Re: [PVE-User] High I/O waits, not sure if it's a ceph issue.

2020-06-30 Thread Lindsay Mathieson

On 30/06/2020 10:07 pm, jameslipski via pve-user wrote:

Before I update ceph, regarding KRBD, I've just enabled it, do I have do 
re-create the pool, restart ceph, restart the node, etc... or it just takes 
into effect?



No need to recreate the pool, just stop/start the VM's accessing it.

--
Lindsay

___
pve-user mailing list
pve-user@pve.proxmox.com
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user


Re: [PVE-User] High I/O waits, not sure if it's a ceph issue.

2020-06-30 Thread Mark Schouten
On Tue, Jun 30, 2020 at 11:28:51AM +1000, Lindsay Mathieson wrote:
> Do you have KRBD set for the Proxmox Ceph Storage? that help a lot.

I think this is incorrect. Using KRBD uses the kernel-driver which is
usually older than the userland-version. Also, upgrading is easier when
not using KRBD. 

I'd like to hear that I'm wrong, am I? :)

-- 
Mark Schouten | Tuxis B.V.
KvK: 74698818 | http://www.tuxis.nl/
T: +31 318 200208 | i...@tuxis.nl
___
pve-user mailing list
pve-user@pve.proxmox.com
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user


Re: [PVE-User] High I/O waits, not sure if it's a ceph issue.

2020-06-30 Thread jameslipski via pve-user
--- Begin Message ---
Thanks for the reply

All nodes are connected to a 10Gbit switch. Ceph is currently running on 14.2.2 
but will update to the latest. KRBD was not enabled to the pool.

Before I update ceph, regarding KRBD, I've just enabled it, do I have do 
re-create the pool, restart ceph, restart the node, etc... or it just takes 
into effect?

‐‐‐ Original Message ‐‐‐
On Monday, June 29, 2020 9:28 PM, Lindsay Mathieson 
 wrote:

> On 30/06/2020 11:08 am, jameslipski via pve-user wrote:
>
> > ust to give a little bit of a background, we currently we have 6 nodes. 
> > We're running CEPH, and each node consists of
> > 2 osds (each node has 2x Intel SSDSC2KG019T8) OSD Type is bluestore. Global 
> > ceph configurations (at least as shown on the proxmox interface) is as 
> > follows:
>
> Network config? (ie. speed etc).
>
> Ceph is Nautilus 14.2.9? (latest on proxmox)
>
> Do you have KRBD set for the Proxmox Ceph Storage? that help a lot.
>
> --
>
> Lindsay
>
> pve-user mailing list
> pve-user@pve.proxmox.com
> https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user


--- End Message ---
___
pve-user mailing list
pve-user@pve.proxmox.com
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user


Re: [PVE-User] High I/O waits, not sure if it's a ceph issue.

2020-06-29 Thread Lindsay Mathieson

On 30/06/2020 11:08 am, jameslipski via pve-user wrote:

ust to give a little bit of a background, we currently we have 6 nodes. We're 
running CEPH, and each node consists of
2 osds (each node has 2x Intel SSDSC2KG019T8) OSD Type is bluestore. Global 
ceph configurations (at least as shown on the proxmox interface) is as follows:


Network config? (ie. speed etc).


Ceph is Nautilus 14.2.9? (latest on proxmox)


Do you have KRBD set for the Proxmox Ceph Storage? that help a lot.

--
Lindsay

___
pve-user mailing list
pve-user@pve.proxmox.com
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user


[PVE-User] High I/O waits, not sure if it's a ceph issue.

2020-06-29 Thread jameslipski via pve-user
--- Begin Message ---
Greetings,

I'm trying out PVE. Currently I'm just doing tests and ran into an issue 
relating to high I/O waits.

Just to give a little bit of a background, we currently we have 6 nodes. We're 
running CEPH, and each node consists of
2 osds (each node has 2x Intel SSDSC2KG019T8) OSD Type is bluestore. Global 
ceph configurations (at least as shown on the proxmox interface) is as follows:

[global]
auth_client_required = 
auth_cluster_required = 
auth_service_required = 
cluster_network = 10.125.0.0/24
fsid = f64d2a67-98c3-4dbc-abfd-906ea7aaf314
mon_allow_pool_delete = true
mon_host = 10.125.0.101 10.125.0.102 10.125.0.103 10.125.0.105
10.125.0.106 10.125.0.104
osd_pool_default_min_size = 2
osd_pool_default_size = 3
public_network = 10.125.0.0/24

[client]
keyring = /etc/pve/priv/$cluster.$name.keyring

If I'm missing any relevant information relating to my ceph setup (I'm still 
learning this), please let me know.

Each node consists of 2x Xeon E5-2660 v3. Where I ran into high I/O waits is 
when running 2 VMs. 1 VM is a mysql replication server (using 8 cores), and is 
performing mostly writes. The second VM is running Debian with Cacti. Both of 
these systems are on 2 different nodes but uses CEPH to store the vm-hd. When I 
copied files over the network to the VM running Cacti, I've noticed high I/O 
waits in my mysql VM.

I'm assuming that this has something to do with ceph; though the only thing I'm 
seeing in the ceph logs are the following:

02:43:01.062082 mgr.node01 (mgr.2914449) 8009571 : cluster [DBG] pgmap 
v8009574: 512 pgs: 512 active+clean; 246 GiB data, 712 GiB used, 20 TiB / 21 
TiB avail; 2.4 MiB/s wr, 274 op/s
02:43:03.063137 mgr.node01 (mgr.2914449) 8009572 : cluster [DBG] pgmap 
v8009575: 512 pgs: 512 active+clean; 246 GiB data, 712 GiB used, 20 TiB / 21 
TiB avail; 0 B/s rd, 3.0 MiB/s wr, 380 op/s
02:43:05.064125 mgr.node01 (mgr.2914449) 8009573 : cluster [DBG] pgmap 
v8009576: 512 pgs: 512 active+clean; 246 GiB data, 712 GiB used, 20 TiB / 21 
TiB avail; 0 B/s rd, 2.9 MiB/s wr, 332 op/s
02:43:07.065373 mgr.node01 (mgr.2914449) 8009574 : cluster [DBG] pgmap 
v8009577: 512 pgs: 512 active+clean; 246 GiB data, 712 GiB used, 20 TiB / 21 
TiB avail; 0 B/s rd, 2.7 MiB/s wr, 313 op/s
02:43:09.066210 mgr.node01 (mgr.2914449) 8009575 : cluster [DBG] pgmap 
v8009578: 512 pgs: 512 active+clean; 246 GiB data, 712 GiB used, 20 TiB / 21 
TiB avail; 341 B/s rd, 2.9 MiB/s wr, 350 op/s
02:43:11.066913 mgr.node01 (mgr.2914449) 8009576 : cluster [DBG] pgmap 
v8009579: 512 pgs: 512 active+clean; 246 GiB data, 712 GiB used, 20 TiB / 21 
TiB avail; 341 B/s rd, 3.1 MiB/s wr, 346 op/s
02:43:13.067926 mgr.node01 (mgr.2914449) 8009577 : cluster [DBG] pgmap 
v8009580: 512 pgs: 512 active+clean; 246 GiB data, 712 GiB used, 20 TiB / 21 
TiB avail; 341 B/s rd, 3.5 MiB/s wr, 408 op/s
02:43:15.068834 mgr.node01 (mgr.2914449) 8009578 : cluster [DBG] pgmap 
v8009581: 512 pgs: 512 active+clean; 246 GiB data, 712 GiB used, 20 TiB / 21 
TiB avail; 341 B/s rd, 3.0 MiB/s wr, 320 op/s
02:43:17.069627 mgr.node01 (mgr.2914449) 8009579 : cluster [DBG] pgmap 
v8009582: 512 pgs: 512 active+clean; 246 GiB data, 712 GiB used, 20 TiB / 21 
TiB avail; 341 B/s rd, 2.5 MiB/s wr, 285 op/s
02:43:19.070507 mgr.node01 (mgr.2914449) 8009580 : cluster [DBG] pgmap 
v8009583: 512 pgs: 512 active+clean; 246 GiB data, 712 GiB used, 20 TiB / 21 
TiB avail; 341 B/s rd, 3.0 MiB/s wr, 349 op/s
02:43:21.071241 mgr.node01 (mgr.2914449) 8009581 : cluster [DBG] pgmap 
v8009584: 512 pgs: 512 active+clean; 246 GiB data, 712 GiB used, 20 TiB / 21 
TiB avail; 0 B/s rd, 2.8 MiB/s wr, 319 op/s
02:43:23.072286 mgr.node01 (mgr.2914449) 8009582 : cluster [DBG] pgmap 
v8009585: 512 pgs: 512 active+clean; 246 GiB data, 712 GiB used, 20 TiB / 21 
TiB avail; 2.7 MiB/s wr, 329 op/s
02:43:25.073369 mgr.node01 (mgr.2914449) 8009583 : cluster [DBG] pgmap 
v8009586: 512 pgs: 512 active+clean; 246 GiB data, 712 GiB used, 20 TiB / 21 
TiB avail; 2.8 MiB/s wr, 304 op/s
02:43:27.074315 mgr.node01 (mgr.2914449) 8009584 : cluster [DBG] pgmap 
v8009587: 512 pgs: 512 active+clean; 246 GiB data, 712 GiB used, 20 TiB / 21 
TiB avail; 2.2 MiB/s wr, 262 op/s
02:43:29.075284 mgr.node01 (mgr.2914449) 8009585 : cluster [DBG] pgmap 
v8009588: 512 pgs: 512 active+clean; 246 GiB data, 712 GiB used, 20 TiB / 21 
TiB avail; 682 B/s rd, 2.9 MiB/s wr, 342 op/s
02:43:31.076180 mgr.node01 (mgr.2914449) 8009586 : cluster [DBG] pgmap 
v8009589: 512 pgs: 512 active+clean; 246 GiB data, 712 GiB used, 20 TiB / 21 
TiB avail; 682 B/s rd, 2.4 MiB/s wr, 269 op/s
02:43:33.077523 mgr.node01 (mgr.2914449) 8009587 : cluster [DBG] pgmap 
v8009590: 512 pgs: 512 active+clean; 246 GiB data, 712 GiB used, 20 TiB / 21 
TiB avail; 682 B/s rd, 3.4 MiB/s wr, 389 op/s
02:43:35.078543 mgr.node01 (mgr.2914449) 8009588 : cluster [DBG] pgmap 
v8009591: 512 pgs: 512 active+clean; 246 GiB data, 712 GiB used, 20 TiB / 21 
TiB avail; 682 B/s rd, 3.1 MiB/s wr, 344 op/s
02:43:37.079428