Re: [PVE-User] Proxmox with ceph storage VM performance strangeness
On top I use xfs. Just curious: Why did you ask :-) ? Rainer Am 22.04.20 um 17:03 schrieb Alwin Antreich: > On Wed, Apr 22, 2020 at 12:43:58PM +0200, Rainer Krienke wrote: >> hello, >> >> there is no single workload, but a bunch of VMs the do a lot of >> different things many of which do not special performance >> demands. The VMs that do need speed are NFS Fileservers and SMBservers. >> >> And exactly these servers seem to benefit from the larger block size. >> The two I tested mostly on are fileservers. > I am curious, so your setup is a striped LVM with what filesystem on top? > >> >> Aside from bonnie++ I also tested writing and reading files especially >> many small ones. This also works very good and a fact that at the >> beginning of my search was not true is now: The proxmox/ceph solution is >> faster and in some disciplines much faster than the old xenbased with >> ISCSI storage as a backend. > Nice to hear. :) > > -- > Cheers, > Alwin > -- Rainer Krienke, Uni Koblenz, Rechenzentrum, A22, Universitaetsstrasse 1 56070 Koblenz, Web: http://www.uni-koblenz.de/~krienke, Tel: +49261287 1312 PGP: http://www.uni-koblenz.de/~krienke/mypgp.html, Fax: +49261287 1001312 ___ pve-user mailing list pve-user@pve.proxmox.com https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
Re: [PVE-User] Proxmox with ceph storage VM performance strangeness
On Wed, Apr 22, 2020 at 12:43:58PM +0200, Rainer Krienke wrote: > hello, > > there is no single workload, but a bunch of VMs the do a lot of > different things many of which do not special performance > demands. The VMs that do need speed are NFS Fileservers and SMBservers. > > And exactly these servers seem to benefit from the larger block size. > The two I tested mostly on are fileservers. I am curious, so your setup is a striped LVM with what filesystem on top? > > Aside from bonnie++ I also tested writing and reading files especially > many small ones. This also works very good and a fact that at the > beginning of my search was not true is now: The proxmox/ceph solution is > faster and in some disciplines much faster than the old xenbased with > ISCSI storage as a backend. Nice to hear. :) -- Cheers, Alwin ___ pve-user mailing list pve-user@pve.proxmox.com https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
Re: [PVE-User] Proxmox with ceph storage VM performance strangeness
hello, there is no single workload, but a bunch of VMs the do a lot of different things many of which do not special performance demands. The VMs that do need speed are NFS Fileservers and SMBservers. And exactly these servers seem to benefit from the larger block size. The two I tested mostly on are fileservers. Aside from bonnie++ I also tested writing and reading files especially many small ones. This also works very good and a fact that at the beginning of my search was not true is now: The proxmox/ceph solution is faster and in some disciplines much faster than the old xenbased with ISCSI storage as a backend. Thanks Rainer Am 21.04.20 um 16:01 schrieb Alwin Antreich: > On Tue, Apr 21, 2020 at 03:34:47PM +0200, Rainer Krienke wrote: >> Hello, >> >> just wanted to thank you for your help and to tell you that I found the >> culprit that made my read-perfomance look rarther small on a proxmox VM >> with a LV based on 4 disks (rdbs). The best result using bonnie++ as a >> test was about 100MBytes/sec on a single VM read performance. >> >> I remembered that right after I switched to LVM striping I had tested >> the default block size LVM would use which was 64K. I found this value >> rather small and replaces it by 512K, which increased read and write >> speed. >> >> I remenbered this fact and again changed the stripe size to the default >> RBD object size which is 4MB. Using this value the read performance went >> up to 400MBytes/sec and if I run two bonnies on two different VMs the >> total read performance in ceph is about 800MBytes/sec >> The write performance in this 2VMs test is about 1.2GByte/sec. > The 4 MB chunk size is good for Ceph. But how is the intended workload > performing on those VMs? > > -- > Cheers, > Alwin > > ___ > pve-user mailing list > pve-user@pve.proxmox.com > https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user > -- Rainer Krienke, Uni Koblenz, Rechenzentrum, A22, Universitaetsstrasse 1 56070 Koblenz, Web: http://www.uni-koblenz.de/~krienke, Tel: +49261287 1312 PGP: http://www.uni-koblenz.de/~krienke/mypgp.html, Fax: +49261287 1001312 ___ pve-user mailing list pve-user@pve.proxmox.com https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
Re: [PVE-User] Proxmox with ceph storage VM performance strangeness
On Tue, Apr 21, 2020 at 03:34:47PM +0200, Rainer Krienke wrote: > Hello, > > just wanted to thank you for your help and to tell you that I found the > culprit that made my read-perfomance look rarther small on a proxmox VM > with a LV based on 4 disks (rdbs). The best result using bonnie++ as a > test was about 100MBytes/sec on a single VM read performance. > > I remembered that right after I switched to LVM striping I had tested > the default block size LVM would use which was 64K. I found this value > rather small and replaces it by 512K, which increased read and write > speed. > > I remenbered this fact and again changed the stripe size to the default > RBD object size which is 4MB. Using this value the read performance went > up to 400MBytes/sec and if I run two bonnies on two different VMs the > total read performance in ceph is about 800MBytes/sec > The write performance in this 2VMs test is about 1.2GByte/sec. The 4 MB chunk size is good for Ceph. But how is the intended workload performing on those VMs? -- Cheers, Alwin ___ pve-user mailing list pve-user@pve.proxmox.com https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
Re: [PVE-User] Proxmox with ceph storage VM performance strangeness
On Tue, Apr 14, 2020 at 08:15:15PM +0200, Rainer Krienke wrote: > Am 14.04.20 um 18:09 schrieb Alwin Antreich: > > >> > >> In a VM I also tried to read its own striped LV device: dd > >> if=/dev/vg/testlv of=/dev/null bs=1024k status=progress (after clearing > >> the VMs cache). /dev/vg/testlv is a striped LV (on 4 disks) with xfs on > >> it on which I tested the speed using bonnie++ before. > >> This dd also did not go beyond about 100MB/sec, whereas the rados bench > >> promises much more. > > Do you have a VM without stripped volumes? I suppose there will be two > > requests, for each half of the data. That could slow down the read as> well. > > Yes the logical volume is striped using 4 physical volumes (RBDs). But > since exactly this setup helped to boost up writing (more paralellism) > it should do exactly the same since blocks can be read from more > separate rbd devices and thus more disks in general. > > I also tested a VM with just a single rbd used for the VMs disk and > there the effect ist quite the same. > > > > > And you can disable the cache to verify that cache misses don't impact > > the performance. > > I tried and disabled the writeback cache but the effect was only minimal. It seems that at this point the optimizations need to be done inside the VM (eg. readahead). I think the data that is requested is not in the cache and to small to be done within one read operation. -- Cheers, Alwin ___ pve-user mailing list pve-user@pve.proxmox.com https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
Re: [PVE-User] Proxmox with ceph storage VM performance strangeness
Am 14.04.20 um 18:09 schrieb Alwin Antreich: >> >> In a VM I also tried to read its own striped LV device: dd >> if=/dev/vg/testlv of=/dev/null bs=1024k status=progress (after clearing >> the VMs cache). /dev/vg/testlv is a striped LV (on 4 disks) with xfs on >> it on which I tested the speed using bonnie++ before. >> This dd also did not go beyond about 100MB/sec, whereas the rados bench >> promises much more. > Do you have a VM without stripped volumes? I suppose there will be two > requests, for each half of the data. That could slow down the read as> well. Yes the logical volume is striped using 4 physical volumes (RBDs). But since exactly this setup helped to boost up writing (more paralellism) it should do exactly the same since blocks can be read from more separate rbd devices and thus more disks in general. I also tested a VM with just a single rbd used for the VMs disk and there the effect ist quite the same. > > And you can disable the cache to verify that cache misses don't impact > the performance. I tried and disabled the writeback cache but the effect was only minimal. Have a nice day Rainer -- Rainer Krienke, Uni Koblenz, Rechenzentrum, A22, Universitaetsstrasse 1 56070 Koblenz, Web: http://www.uni-koblenz.de/~krienke, Tel: +49261287 1312 PGP: http://www.uni-koblenz.de/~krienke/mypgp.html, Fax: +49261287 1001312 ___ pve-user mailing list pve-user@pve.proxmox.com https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
Re: [PVE-User] Proxmox with ceph storage VM performance strangeness
On Tue, Apr 14, 2020 at 05:21:44PM +0200, Rainer Krienke wrote: > Am 14.04.20 um 16:42 schrieb Alwin Antreich: > >> According to these numbers the relation from write and read performance > >> should be the other way round: writes should be slower than reads, but > >> on a VM its exactly the other way round? > > > Ceph does reads in parallel, while writes are done to the primary OSD by > > the client. And that OSD is responsible for distributing the other > > copies. > > Ah yes right. The primary OSD has to wait until all the OSDs in the pg > have confirmed that data has been written to each of the OSDs. Reads as > you said are parallel so I would expect reading to be faster than > writing, but for me it is *not* in a proxmox VM with ceph rbd storage. > > However reads are faster on a ceph level, in a rados bench directly on a > pxa host (no VM) which is what I would expect also for reads/writes > inside a VM. > > > >> > >> Any idea why nevertheless writes on a VM are ~3 times faster then reads > >> and what I could try to speed up reading? > > > What is the byte size of bonnie++? If it uses 4 KB and data isn't in the > > cache, whole objects need to be requested from the cluster. > > I did not find information about blocksizes used. The whole file that is > written and later on read again by bonnie++ is however by default at > least twice the size of your RAM. According to the man page the chunk size is 8192 bytes by default. > > In a VM I also tried to read its own striped LV device: dd > if=/dev/vg/testlv of=/dev/null bs=1024k status=progress (after clearing > the VMs cache). /dev/vg/testlv is a striped LV (on 4 disks) with xfs on > it on which I tested the speed using bonnie++ before. > This dd also did not go beyond about 100MB/sec, whereas the rados bench > promises much more. Do you have a VM without stripped volumes? I suppose there will be two requests, for each half of the data. That could slow down the read as well. And you can disable the cache to verify that cache misses don't impact the performance. -- Cheers, Alwin ___ pve-user mailing list pve-user@pve.proxmox.com https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
Re: [PVE-User] Proxmox with ceph storage VM performance strangeness
Am 14.04.20 um 16:42 schrieb Alwin Antreich: >> According to these numbers the relation from write and read performance >> should be the other way round: writes should be slower than reads, but >> on a VM its exactly the other way round? > Ceph does reads in parallel, while writes are done to the primary OSD by > the client. And that OSD is responsible for distributing the other > copies. Ah yes right. The primary OSD has to wait until all the OSDs in the pg have confirmed that data has been written to each of the OSDs. Reads as you said are parallel so I would expect reading to be faster than writing, but for me it is *not* in a proxmox VM with ceph rbd storage. However reads are faster on a ceph level, in a rados bench directly on a pxa host (no VM) which is what I would expect also for reads/writes inside a VM. > >> >> Any idea why nevertheless writes on a VM are ~3 times faster then reads >> and what I could try to speed up reading? > What is the byte size of bonnie++? If it uses 4 KB and data isn't in the > cache, whole objects need to be requested from the cluster. I did not find information about blocksizes used. The whole file that is written and later on read again by bonnie++ is however by default at least twice the size of your RAM. In a VM I also tried to read its own striped LV device: dd if=/dev/vg/testlv of=/dev/null bs=1024k status=progress (after clearing the VMs cache). /dev/vg/testlv is a striped LV (on 4 disks) with xfs on it on which I tested the speed using bonnie++ before. This dd also did not go beyond about 100MB/sec, whereas the rados bench promises much more. Thanks Rainer -- Rainer Krienke, Uni Koblenz, Rechenzentrum, A22, Universitaetsstrasse 1 56070 Koblenz, Web: http://www.uni-koblenz.de/~krienke, Tel: +49261287 1312 PGP: http://www.uni-koblenz.de/~krienke/mypgp.html, Fax: +49261287 1001312 ___ pve-user mailing list pve-user@pve.proxmox.com https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
Re: [PVE-User] Proxmox with ceph storage VM performance strangeness
On Tue, Apr 14, 2020 at 03:54:30PM +0200, Rainer Krienke wrote: > Hello, > > in between I learned a lot from this group (thanks a lot) to solve many > performance problems I initially faced with proxmox in VMs having their > storage on CEPH rbds. > > I parallelized access to many disks on a vm where possible, used > iothreads and activated writeback cache. > > Running a bonnie++ I am now able to get about 300Mbytes/sec block write > performance, which is a great value because it even scales out with ceph > if I run the same bonnie++ on eg two machines. In this case I get > 600MBytes/sec. Great. > > The last strangeness I am experiencing is read performance. The same > bonnie on a VMs xfs_filesystem that yields 300MB write performance only > gets a block read of about 90MB/sec. > > So on one of the pxa-hosts and later also on one of the ceph cluster > nodes (nautilus 14.2.8, 144OSDs) I ran a rados bench test to see if ceph > is slowing down reads. The results on both systems were very similar. So > here is the test result from the pxa-host: > > # rados bench -p my-rbd 60 write --no-cleanup > Total time run: 60.284332 > Total writes made: 5376 > Write size: 4194304 > Object size:4194304 > Bandwidth (MB/sec): 356.71 > Stddev Bandwidth: 46.8361 > Max bandwidth (MB/sec): 424 > Min bandwidth (MB/sec): 160 > Average IOPS: 89 > Stddev IOPS:11 > Max IOPS: 106 > Min IOPS: 40 > Average Latency(s): 0.179274 > Stddev Latency(s): 0.105626 > Max latency(s): 1.00746 > Min latency(s): 0.0656261 > > # echo 3 > /proc/sys/vm/drop_caches > # rados bench -p pxa-rbd 60 seq > Total time run: 24.208097 > Total reads made: 5376 > Read size:4194304 > Object size: 4194304 > Bandwidth (MB/sec): 888.298 > Average IOPS: 222 > Stddev IOPS: 33 > Max IOPS: 249 > Min IOPS: 92 > Average Latency(s): 0.0714553 > Max latency(s): 0.63154 > Min latency(s): 0.0237746 > > > According to these numbers the relation from write and read performance > should be the other way round: writes should be slower than reads, but > on a VM its exactly the other way round? Ceph does reads in parallel, while writes are done to the primary OSD by the client. And that OSD is responsible for distributing the other copies. > > Any idea why nevertheless writes on a VM are ~3 times faster then reads > and what I could try to speed up reading? What is the byte size of bonnie++? If it uses 4 KB and data isn't in the cache, whole objects need to be requested from the cluster. -- Cheers, Alwin ___ pve-user mailing list pve-user@pve.proxmox.com https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
Re: [PVE-User] Proxmox with ceph storage VM performance strangeness
Hello, in between I learned a lot from this group (thanks a lot) to solve many performance problems I initially faced with proxmox in VMs having their storage on CEPH rbds. I parallelized access to many disks on a vm where possible, used iothreads and activated writeback cache. Running a bonnie++ I am now able to get about 300Mbytes/sec block write performance, which is a great value because it even scales out with ceph if I run the same bonnie++ on eg two machines. In this case I get 600MBytes/sec. Great. The last strangeness I am experiencing is read performance. The same bonnie on a VMs xfs_filesystem that yields 300MB write performance only gets a block read of about 90MB/sec. So on one of the pxa-hosts and later also on one of the ceph cluster nodes (nautilus 14.2.8, 144OSDs) I ran a rados bench test to see if ceph is slowing down reads. The results on both systems were very similar. So here is the test result from the pxa-host: # rados bench -p my-rbd 60 write --no-cleanup Total time run: 60.284332 Total writes made: 5376 Write size: 4194304 Object size:4194304 Bandwidth (MB/sec): 356.71 Stddev Bandwidth: 46.8361 Max bandwidth (MB/sec): 424 Min bandwidth (MB/sec): 160 Average IOPS: 89 Stddev IOPS:11 Max IOPS: 106 Min IOPS: 40 Average Latency(s): 0.179274 Stddev Latency(s): 0.105626 Max latency(s): 1.00746 Min latency(s): 0.0656261 # echo 3 > /proc/sys/vm/drop_caches # rados bench -p pxa-rbd 60 seq Total time run: 24.208097 Total reads made: 5376 Read size:4194304 Object size: 4194304 Bandwidth (MB/sec): 888.298 Average IOPS: 222 Stddev IOPS: 33 Max IOPS: 249 Min IOPS: 92 Average Latency(s): 0.0714553 Max latency(s): 0.63154 Min latency(s): 0.0237746 According to these numbers the relation from write and read performance should be the other way round: writes should be slower than reads, but on a VM its exactly the other way round? Any idea why nevertheless writes on a VM are ~3 times faster then reads and what I could try to speed up reading? Thanks a lot Rainer -- Rainer Krienke, Uni Koblenz, Rechenzentrum, A22, Universitaetsstrasse 1 56070 Koblenz, Web: http://www.uni-koblenz.de/~krienke, Tel: +49261287 1312 PGP: http://www.uni-koblenz.de/~krienke/mypgp.html, Fax: +49261287 1001312 -- Rainer Krienke, Uni Koblenz, Rechenzentrum, A22, Universitaetsstrasse 1 56070 Koblenz, Web: http://www.uni-koblenz.de/~krienke, Tel: +49261287 1312 PGP: http://www.uni-koblenz.de/~krienke/mypgp.html, Fax: +49261287 1001312 ___ pve-user mailing list pve-user@pve.proxmox.com https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
Re: [PVE-User] Proxmox with ceph storage VM performance strangeness
Hi Rainer, El 17/3/20 a las 16:58, Rainer Krienke escribió: thanks for your answer, Take into account I haven't used iothreads, what I told you is what I learned here and elsewhere. Alexandre and Alwin are experts in this instead ;) if I understand you correctly, than iothreads can only help if the VM has more than one disk, hence your proposal to build a raid0 on two rbd devices. The disadvantage of this solution would of course be that disk usage would be doubled. Not necesarilly, just create more, smaller disks. Create a stripped raid0 and add it as PV to LVM, then create the LVs you need. Alwin is right that this will make disk management more complex... A fileserver VM I manage (not yet productive) could profit from this. I use LVM on it anyway and I could use striped LVs, so those volumes would read from more vm pv disks. Should help I guess. The hosts CPU is a AMD EPYC 7402 24-Core Processor. Does it make sense to select a specific CPU-type for the VM. My test machines have a default kvm64 processor. The number of processors should then probably be minimal equal to the number of disks (number of iothreads)? If all hosts have the same CPU, then use "host" type CPU. Do you know if it makes any difference wheater I use the VirtIO SCSI-driver versus the Virtio-SCSI-single driver? I haven't tried -single, maybe others can comment on this. Cheers Eneko Thank you very much Rainer Am 17.03.20 um 14:10 schrieb Eneko Lacunza: Hi, You can try to enable IO threads and assign multiple Ceph disks to the VM, then build some kind of raid0 to increase performance. Generally speaking, a SSD based Ceph cluster is considered to perform well when a VM gets about 2000 IOPS, and factors like CPU 1-thread performance, network and disk have to be selected with care. Also server's energy saving disabled, etc. What CPUs in those 9 nodes? Ceph is built for parallel access and scaling. You're only using 1 thread of your VM for disk IO currently. Cheers Eneko El 17/3/20 a las 14:04, Rainer Krienke escribió: Hello, I run a pve 6.1-7 cluster with 5 nodes that is attached (via 10Gb Network) to a ceph nautilus cluster with 9 ceph nodes and 144 magnetic disks. The pool with rbd images for disk storage is erasure coded with a 4+2 profile. I ran some performance tests since I noticed that there seems to be a strange limit to the disk read/write rate on a single VM even if the physical machine hosting the VM as well as cluster is in total capable of doing much more. So what I did was to run a bonnie++ as well as a dd read/write test first in parallel on 10 VMs, then on 5 VMs and at last on a single one. A value of "75" for "bo++rd" in the first line below means that each of the 10 bonnie++-processes running on 10 different proxmox VMs in parallel reported in average over all the results a value of 75MBytes/sec for "block read". The ceph-values are the peaks measured by ceph itself during the test run (all rd/wr values in MBytes/sec): VM-count: bo++rd: bo++wr: ceph(rd/wr): dd-rd: dd-wr: ceph(rd/wr): 10 75 42 540/485 55 58 698/711 5 90 62 310/338 47 80 248/421 1 108 114 111/120 130 145 337/165 What I find a little strange is that running many VMs doing IO in parallel I reach a write rate of about 485-711 MBytes/sec. However when running a single VM the maximum is at 120-165 MBytes/sec. Since the whole networking is based on a 10GB infrastructure and an iperf test between a VM and a ceph node reported nearby 10Gb I would expect a higher rate for the single VM. Even if I run a test with 5 VMs on *one* physical host (values not shown above), the results are not far behind the values for 5 VMs on 5 hosts shown above. So the single host seems not to be the limiting factor, but the VM itself is limiting IO. What rates do you find on your proxmox/ceph cluster for single VMs? Does any one have any explanation for this rather big difference or perhaps an idea what to try in order to get higher IO-rates from a single VM? Thank you very much in advance Rainer - Here are the more detailed test results for anyone interested: Using bonnie++: 10 VMs (two on each of the 5 hosts) VMs: 4GB RAM, BTRFS, cd /root; bonnie++ -u root Average for each VM: block write: ~42MByte/sec, block read: ~75MByte/sec ceph: total peak: 485MByte/sec write, 540MByte/sec read 5 VMs (one on each of the 5 hosts) 4GB RAM, BTRFS, cd /root; bonnie++ -u root Average for each VM: block write: ~62MByte/sec, block read: ~90MByte/sec ceph: total peak: 338MByte/sec write, 310MByte/sec read 1 VM 4GB RAM, BTRFS, cd /root; bonnie++ -u root Average for VM: block write: ~114 MByte/sec, block read: ~108MByte/sec ceph: total peak: 120 MByte/sec write, 111MByte/sec read Using dd: 10 VMs (two on each of the 5 hosts) VMs: 4GB RAM, write on a ceph based vm-disk "sdb"
Re: [PVE-User] Proxmox with ceph storage VM performance strangeness
>>What rates do you find on your proxmox/ceph cluster for single VMs? with replicat x3 and 4k block random read/write with big queue depth, I'm around 7iops read && 4iops write (by vm disk if iothread is used, the limitation is cpu usage of 1 thread/core by disk) with queue depth=1, I'm around 4000-5000 iops. (because of network latency + cpu latency). This is with client/server with 3ghz intel cpu. - Mail original - De: "Rainer Krienke" À: "proxmoxve" Envoyé: Mardi 17 Mars 2020 14:04:22 Objet: [PVE-User] Proxmox with ceph storage VM performance strangeness Hello, I run a pve 6.1-7 cluster with 5 nodes that is attached (via 10Gb Network) to a ceph nautilus cluster with 9 ceph nodes and 144 magnetic disks. The pool with rbd images for disk storage is erasure coded with a 4+2 profile. I ran some performance tests since I noticed that there seems to be a strange limit to the disk read/write rate on a single VM even if the physical machine hosting the VM as well as cluster is in total capable of doing much more. So what I did was to run a bonnie++ as well as a dd read/write test first in parallel on 10 VMs, then on 5 VMs and at last on a single one. A value of "75" for "bo++rd" in the first line below means that each of the 10 bonnie++-processes running on 10 different proxmox VMs in parallel reported in average over all the results a value of 75MBytes/sec for "block read". The ceph-values are the peaks measured by ceph itself during the test run (all rd/wr values in MBytes/sec): VM-count: bo++rd: bo++wr: ceph(rd/wr): dd-rd: dd-wr: ceph(rd/wr): 10 75 42 540/485 55 58 698/711 5 90 62 310/338 47 80 248/421 1 108 114 111/120 130 145 337/165 What I find a little strange is that running many VMs doing IO in parallel I reach a write rate of about 485-711 MBytes/sec. However when running a single VM the maximum is at 120-165 MBytes/sec. Since the whole networking is based on a 10GB infrastructure and an iperf test between a VM and a ceph node reported nearby 10Gb I would expect a higher rate for the single VM. Even if I run a test with 5 VMs on *one* physical host (values not shown above), the results are not far behind the values for 5 VMs on 5 hosts shown above. So the single host seems not to be the limiting factor, but the VM itself is limiting IO. What rates do you find on your proxmox/ceph cluster for single VMs? Does any one have any explanation for this rather big difference or perhaps an idea what to try in order to get higher IO-rates from a single VM? Thank you very much in advance Rainer - Here are the more detailed test results for anyone interested: Using bonnie++: 10 VMs (two on each of the 5 hosts) VMs: 4GB RAM, BTRFS, cd /root; bonnie++ -u root Average for each VM: block write: ~42MByte/sec, block read: ~75MByte/sec ceph: total peak: 485MByte/sec write, 540MByte/sec read 5 VMs (one on each of the 5 hosts) 4GB RAM, BTRFS, cd /root; bonnie++ -u root Average for each VM: block write: ~62MByte/sec, block read: ~90MByte/sec ceph: total peak: 338MByte/sec write, 310MByte/sec read 1 VM 4GB RAM, BTRFS, cd /root; bonnie++ -u root Average for VM: block write: ~114 MByte/sec, block read: ~108MByte/sec ceph: total peak: 120 MByte/sec write, 111MByte/sec read Using dd: 10 VMs (two on each of the 5 hosts) VMs: 4GB RAM, write on a ceph based vm-disk "sdb" (rbd) write: dd if=/dev/zero of=/dev/sdb bs=nnn count=kkk conv=fsync status=progress read: dd of=/dev/null if=/dev/sdb bs=nnn count=kkk status=progress Average for each VM: bs=1024k count=12000: dd write: ~58MByte/sec, dd read: ~48MByte/sec bs=4096k count=3000: dd write: ~59MByte/sec, dd read: ~55MByte/sec ceph: total peak: 711MByte/sec write, 698 MByte/sec read 5 VMs (two on each of the 5 hosts) VMs: 4GB RAM, write on a ceph based vm-disk "sdb" (rbd) write: dd if=/dev/zero of=/dev/sdb bs=4096k count=3000 conv=fsync status=progress read: dd of=/dev/null if=/dev/sdb bs=4096k count=3000 status=progress Average for each VM: bs=4096 count=3000: dd write: ~80 MByte/sec, dd read: ~47MByte/sec ceph: total peak: 421MByte/sec write, 248 MByte/sec read 1 VM: 4GB RAM, write on a ceph based vm-disk "sdb" (rbd-device) write: dd if=/dev/zero of=/dev/sdb bs=4096k count=3000 conv=fsync status=progress read: dd of=/dev/null if=/dev/sdb bs=4096k count=3000 status=progress Average for each VM: bs=4096k count=3000: dd write: ~145 MByte/sec, dd read: ~130 MByte/sec ceph: total peak: 165 MByte/sec write, 337 MByte/sec read -- Rainer Krienke, Uni Koblenz, Rechenzentrum, A22, Universitaetsstrasse 1 56070 Koblenz, Web: http://www.uni-koblenz.de/~krienke, Tel: +49261287 1312 PGP: http://www.uni-koblenz.de/~krienke/mypgp.html, Fax: +49261287 1001312 ___ pve-user mailing list pve-user@pve.proxmox.com https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user __
Re: [PVE-User] Proxmox with ceph storage VM performance strangeness
On Tue, Mar 17, 2020 at 05:07:47PM +0100, Rainer Krienke wrote: > Hello Alwin, > > thank you for your reply. > > The test VMs config is this one. It only has the system disk as well a > disk I added for my test writing on the device with dd: > > agent: 1 > bootdisk: scsi0 > cores: 2 > cpu: kvm64 If possible, set host as CPU type. Exposes all extension of the CPU model to the VM. But you will need the same CPU model on all the nodes. Otherwise try to find a model with a common set of features. > ide2: none,media=cdrom > memory: 4096 With more memory for the VM, you could also tune the caching inside the VM. > name: pxaclient1 > net0: virtio=52:24:28:e9:18:24,bridge=vmbr1,firewall=1 > numa: 0 > ostype: l26 > scsi0: ceph:vm-100-disk-0,size=32G > scsi1: ceph:vm-100-disk-1,size=500G Use cache=writeback, Qemu caching modes translate to the Ceph cache. With writeback, Ceph activates the librbd caching (default 25 MB). > scsihw: virtio-scsi-pci > serial0: socket > smbios1: uuid=c57eb716-8188-485b-89cb-35d41dbf3fc1 > sockets: 2 If it is a NUMA system, then best activate also the NUMA flag, as KVM tries to run the two threads (cores) on the same node. > > > This is as said only a test machine. As I already wrote to Enko, I have > some server VMs where I could parallelize IO by using striped LVs at the > moment these LVs are not striped. But of course it would also help if > for the long run there was a way to lift the "one" disk IO bottleneck. Yes, I have seen. But this will make backups and managing the disks harder. -- Cheers, Alwin ___ pve-user mailing list pve-user@pve.proxmox.com https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
Re: [PVE-User] Proxmox with ceph storage VM performance strangeness
Hello Alwin, thank you for your reply. The test VMs config is this one. It only has the system disk as well a disk I added for my test writing on the device with dd: agent: 1 bootdisk: scsi0 cores: 2 cpu: kvm64 ide2: none,media=cdrom memory: 4096 name: pxaclient1 net0: virtio=52:24:28:e9:18:24,bridge=vmbr1,firewall=1 numa: 0 ostype: l26 scsi0: ceph:vm-100-disk-0,size=32G scsi1: ceph:vm-100-disk-1,size=500G scsihw: virtio-scsi-pci serial0: socket smbios1: uuid=c57eb716-8188-485b-89cb-35d41dbf3fc1 sockets: 2 This is as said only a test machine. As I already wrote to Enko, I have some server VMs where I could parallelize IO by using striped LVs at the moment these LVs are not striped. But of course it would also help if for the long run there was a way to lift the "one" disk IO bottleneck. Thank you very much Rainer Am 17.03.20 um 15:26 schrieb Alwin Antreich: > Hallo Rainer, > > On Tue, Mar 17, 2020 at 02:04:22PM +0100, Rainer Krienke wrote: >> Hello, >> >> I run a pve 6.1-7 cluster with 5 nodes that is attached (via 10Gb >> Network) to a ceph nautilus cluster with 9 ceph nodes and 144 magnetic >> disks. The pool with rbd images for disk storage is erasure coded with a >> 4+2 profile. >> >> I ran some performance tests since I noticed that there seems to be a >> strange limit to the disk read/write rate on a single VM even if the >> physical machine hosting the VM as well as cluster is in total capable >> of doing much more. >> >> So what I did was to run a bonnie++ as well as a dd read/write test >> first in parallel on 10 VMs, then on 5 VMs and at last on a single one. >> >> A value of "75" for "bo++rd" in the first line below means that each of >> the 10 bonnie++-processes running on 10 different proxmox VMs in >> parallel reported in average over all the results a value of >> 75MBytes/sec for "block read". The ceph-values are the peaks measured by >> ceph itself during the test run (all rd/wr values in MBytes/sec): >> >> VM-count: bo++rd: bo++wr: ceph(rd/wr): dd-rd: dd-wr: ceph(rd/wr): >> 10 75 42 540/485 55 58 698/711 >> 5 90 62 310/338 47 80 248/421 >> 1 108 114 111/120 130145 337/165 >> >> -- Rainer Krienke, Uni Koblenz, Rechenzentrum, A22, Universitaetsstrasse 1 56070 Koblenz, Web: http://www.uni-koblenz.de/~krienke, Tel: +49261287 1312 PGP: http://www.uni-koblenz.de/~krienke/mypgp.html, Fax: +49261287 1001312 ___ pve-user mailing list pve-user@pve.proxmox.com https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
Re: [PVE-User] Proxmox with ceph storage VM performance strangeness
Hello, thanks for your answer, if I understand you correctly, than iothreads can only help if the VM has more than one disk, hence your proposal to build a raid0 on two rbd devices. The disadvantage of this solution would of course be that disk usage would be doubled. A fileserver VM I manage (not yet productive) could profit from this. I use LVM on it anyway and I could use striped LVs, so those volumes would read from more vm pv disks. Should help I guess. The hosts CPU is a AMD EPYC 7402 24-Core Processor. Does it make sense to select a specific CPU-type for the VM. My test machines have a default kvm64 processor. The number of processors should then probably be minimal equal to the number of disks (number of iothreads)? Do you know if it makes any difference wheater I use the VirtIO SCSI-driver versus the Virtio-SCSI-single driver? Thank you very much Rainer Am 17.03.20 um 14:10 schrieb Eneko Lacunza: > Hi, > > You can try to enable IO threads and assign multiple Ceph disks to the > VM, then build some kind of raid0 to increase performance. > > Generally speaking, a SSD based Ceph cluster is considered to perform > well when a VM gets about 2000 IOPS, and factors like CPU 1-thread > performance, network and disk have to be selected with care. Also > server's energy saving disabled, etc. > > What CPUs in those 9 nodes? > > Ceph is built for parallel access and scaling. You're only using 1 > thread of your VM for disk IO currently. > > Cheers > Eneko > > El 17/3/20 a las 14:04, Rainer Krienke escribió: >> Hello, >> >> I run a pve 6.1-7 cluster with 5 nodes that is attached (via 10Gb >> Network) to a ceph nautilus cluster with 9 ceph nodes and 144 magnetic >> disks. The pool with rbd images for disk storage is erasure coded with a >> 4+2 profile. >> >> I ran some performance tests since I noticed that there seems to be a >> strange limit to the disk read/write rate on a single VM even if the >> physical machine hosting the VM as well as cluster is in total capable >> of doing much more. >> >> So what I did was to run a bonnie++ as well as a dd read/write test >> first in parallel on 10 VMs, then on 5 VMs and at last on a single one. >> >> A value of "75" for "bo++rd" in the first line below means that each of >> the 10 bonnie++-processes running on 10 different proxmox VMs in >> parallel reported in average over all the results a value of >> 75MBytes/sec for "block read". The ceph-values are the peaks measured by >> ceph itself during the test run (all rd/wr values in MBytes/sec): >> >> VM-count: bo++rd: bo++wr: ceph(rd/wr): dd-rd: dd-wr: ceph(rd/wr): >> 10 75 42 540/485 55 58 698/711 >> 5 90 62 310/338 47 80 248/421 >> 1 108 114 111/120 130 145 337/165 >> >> >> What I find a little strange is that running many VMs doing IO in >> parallel I reach a write rate of about 485-711 MBytes/sec. However when >> running a single VM the maximum is at 120-165 MBytes/sec. Since the >> whole networking is based on a 10GB infrastructure and an iperf test >> between a VM and a ceph node reported nearby 10Gb I would expect a >> higher rate for the single VM. Even if I run a test with 5 VMs on *one* >> physical host (values not shown above), the results are not far behind >> the values for 5 VMs on 5 hosts shown above. So the single host seems >> not to be the limiting factor, but the VM itself is limiting IO. >> >> What rates do you find on your proxmox/ceph cluster for single VMs? >> Does any one have any explanation for this rather big difference or >> perhaps an idea what to try in order to get higher IO-rates from a >> single VM? >> >> Thank you very much in advance >> Rainer >> >> >> >> - >> Here are the more detailed test results for anyone interested: >> >> Using bonnie++: >> 10 VMs (two on each of the 5 hosts) VMs: 4GB RAM, BTRFS, cd /root; >> bonnie++ -u root >> Average for each VM: >> block write: ~42MByte/sec, block read: ~75MByte/sec >> ceph: total peak: 485MByte/sec write, 540MByte/sec read >> >> 5 VMs (one on each of the 5 hosts) 4GB RAM, BTRFS, cd /root; bonnie++ -u >> root >> Average for each VM: >> block write: ~62MByte/sec, block read: ~90MByte/sec >> ceph: total peak: 338MByte/sec write, 310MByte/sec read >> >> 1 VM 4GB RAM, BTRFS, cd /root; bonnie++ -u root >> Average for VM: >> block write: ~114 MByte/sec, block read: ~108MByte/sec >> ceph: total peak: 120 MByte/sec write, 111MByte/sec read >> >> >> Using dd: >> 10 VMs (two on each of the 5 hosts) VMs: 4GB RAM, write on a ceph based >> vm-disk "sdb" (rbd) >> write: dd if=/dev/zero of=/dev/sdb bs=nnn count=kkk conv=fsync >> status=progress >> read: dd of=/dev/null if=/dev/sdb bs=nnn count=kkk status=progress >> Average for each VM: >> bs=1024k count=12000: dd write: ~58MByte/sec, dd read: ~48MByte/sec >> bs=4096k count=3000: dd write: ~59MByt
Re: [PVE-User] Proxmox with ceph storage VM performance strangeness
Hallo Rainer, On Tue, Mar 17, 2020 at 02:04:22PM +0100, Rainer Krienke wrote: > Hello, > > I run a pve 6.1-7 cluster with 5 nodes that is attached (via 10Gb > Network) to a ceph nautilus cluster with 9 ceph nodes and 144 magnetic > disks. The pool with rbd images for disk storage is erasure coded with a > 4+2 profile. > > I ran some performance tests since I noticed that there seems to be a > strange limit to the disk read/write rate on a single VM even if the > physical machine hosting the VM as well as cluster is in total capable > of doing much more. > > So what I did was to run a bonnie++ as well as a dd read/write test > first in parallel on 10 VMs, then on 5 VMs and at last on a single one. > > A value of "75" for "bo++rd" in the first line below means that each of > the 10 bonnie++-processes running on 10 different proxmox VMs in > parallel reported in average over all the results a value of > 75MBytes/sec for "block read". The ceph-values are the peaks measured by > ceph itself during the test run (all rd/wr values in MBytes/sec): > > VM-count: bo++rd: bo++wr: ceph(rd/wr): dd-rd: dd-wr: ceph(rd/wr): > 10 75 42 540/485 55 58 698/711 > 5 90 62 310/338 47 80 248/421 > 1 108 114 111/120 130145 337/165 > > > What I find a little strange is that running many VMs doing IO in > parallel I reach a write rate of about 485-711 MBytes/sec. However when > running a single VM the maximum is at 120-165 MBytes/sec. Since the > whole networking is based on a 10GB infrastructure and an iperf test > between a VM and a ceph node reported nearby 10Gb I would expect a > higher rate for the single VM. Even if I run a test with 5 VMs on *one* > physical host (values not shown above), the results are not far behind > the values for 5 VMs on 5 hosts shown above. So the single host seems > not to be the limiting factor, but the VM itself is limiting IO. > > What rates do you find on your proxmox/ceph cluster for single VMs? > Does any one have any explanation for this rather big difference or > perhaps an idea what to try in order to get higher IO-rates from a > single VM? > > Thank you very much in advance > Rainer As Eneko said, single thread vs multiple threads. How are you VMs configured (qm config )? > > > > - > Here are the more detailed test results for anyone interested: > > Using bonnie++: > 10 VMs (two on each of the 5 hosts) VMs: 4GB RAM, BTRFS, cd /root; > bonnie++ -u root > Average for each VM: > block write: ~42MByte/sec, block read: ~75MByte/sec > ceph: total peak: 485MByte/sec write, 540MByte/sec read > > 5 VMs (one on each of the 5 hosts) 4GB RAM, BTRFS, cd /root; bonnie++ -u > root > Average for each VM: > block write: ~62MByte/sec, block read: ~90MByte/sec > ceph: total peak: 338MByte/sec write, 310MByte/sec read > > 1 VM 4GB RAM, BTRFS, cd /root; bonnie++ -u root > Average for VM: > block write: ~114 MByte/sec, block read: ~108MByte/sec > ceph: total peak: 120 MByte/sec write, 111MByte/sec read How did you configure bonnie? And a CoW filesystem on top of Ceph will certainly drop performance. > > > Using dd: > 10 VMs (two on each of the 5 hosts) VMs: 4GB RAM, write on a ceph based > vm-disk "sdb" (rbd) > write: dd if=/dev/zero of=/dev/sdb bs=nnn count=kkk conv=fsync > status=progress > read: dd of=/dev/null if=/dev/sdb bs=nnn count=kkk status=progress > Average for each VM: > bs=1024k count=12000: dd write: ~58MByte/sec, dd read: ~48MByte/sec > bs=4096k count=3000: dd write: ~59MByte/sec, dd read: ~55MByte/sec > ceph: total peak: 711MByte/sec write, 698 MByte/sec read > > 5 VMs (two on each of the 5 hosts) VMs: 4GB RAM, write on a ceph based > vm-disk "sdb" (rbd) > write: dd if=/dev/zero of=/dev/sdb bs=4096k count=3000 conv=fsync > status=progress > read: dd of=/dev/null if=/dev/sdb bs=4096k count=3000 status=progress > Average for each VM: > bs=4096 count=3000: dd write: ~80 MByte/sec, dd read: ~47MByte/sec > ceph: total peak: 421MByte/sec write, 248 MByte/sec read > > 1 VM: 4GB RAM, write on a ceph based vm-disk "sdb" (rbd-device) > write: dd if=/dev/zero of=/dev/sdb bs=4096k count=3000 conv=fsync > status=progress > read: dd of=/dev/null if=/dev/sdb bs=4096k count=3000 status=progress > Average for each VM: > bs=4096k count=3000: dd write: ~145 MByte/sec, dd read: ~130 MByte/sec > ceph: total peak: 165 MByte/sec write, 337 MByte/sec read dd is not well suited for performance benchmarking. Better use bonnie++ or FIO. The later is good for storage benchmarks in general. As you now have the results from the top-most layer. How do the lower layers perform? Eg. FIO has a build-in rbd engine and is able to talk directly to an rbd image. For Ceph pool performance a rados bench can sure shed some light. -- Cheers, Alwin ___ pve-user ma
Re: [PVE-User] Proxmox with ceph storage VM performance strangeness
Hi, You can try to enable IO threads and assign multiple Ceph disks to the VM, then build some kind of raid0 to increase performance. Generally speaking, a SSD based Ceph cluster is considered to perform well when a VM gets about 2000 IOPS, and factors like CPU 1-thread performance, network and disk have to be selected with care. Also server's energy saving disabled, etc. What CPUs in those 9 nodes? Ceph is built for parallel access and scaling. You're only using 1 thread of your VM for disk IO currently. Cheers Eneko El 17/3/20 a las 14:04, Rainer Krienke escribió: Hello, I run a pve 6.1-7 cluster with 5 nodes that is attached (via 10Gb Network) to a ceph nautilus cluster with 9 ceph nodes and 144 magnetic disks. The pool with rbd images for disk storage is erasure coded with a 4+2 profile. I ran some performance tests since I noticed that there seems to be a strange limit to the disk read/write rate on a single VM even if the physical machine hosting the VM as well as cluster is in total capable of doing much more. So what I did was to run a bonnie++ as well as a dd read/write test first in parallel on 10 VMs, then on 5 VMs and at last on a single one. A value of "75" for "bo++rd" in the first line below means that each of the 10 bonnie++-processes running on 10 different proxmox VMs in parallel reported in average over all the results a value of 75MBytes/sec for "block read". The ceph-values are the peaks measured by ceph itself during the test run (all rd/wr values in MBytes/sec): VM-count: bo++rd: bo++wr: ceph(rd/wr): dd-rd: dd-wr: ceph(rd/wr): 10 75 42 540/485 55 58 698/711 5 90 62 310/338 47 80 248/421 1 108 114 111/120 130145 337/165 What I find a little strange is that running many VMs doing IO in parallel I reach a write rate of about 485-711 MBytes/sec. However when running a single VM the maximum is at 120-165 MBytes/sec. Since the whole networking is based on a 10GB infrastructure and an iperf test between a VM and a ceph node reported nearby 10Gb I would expect a higher rate for the single VM. Even if I run a test with 5 VMs on *one* physical host (values not shown above), the results are not far behind the values for 5 VMs on 5 hosts shown above. So the single host seems not to be the limiting factor, but the VM itself is limiting IO. What rates do you find on your proxmox/ceph cluster for single VMs? Does any one have any explanation for this rather big difference or perhaps an idea what to try in order to get higher IO-rates from a single VM? Thank you very much in advance Rainer - Here are the more detailed test results for anyone interested: Using bonnie++: 10 VMs (two on each of the 5 hosts) VMs: 4GB RAM, BTRFS, cd /root; bonnie++ -u root Average for each VM: block write: ~42MByte/sec, block read: ~75MByte/sec ceph: total peak: 485MByte/sec write, 540MByte/sec read 5 VMs (one on each of the 5 hosts) 4GB RAM, BTRFS, cd /root; bonnie++ -u root Average for each VM: block write: ~62MByte/sec, block read: ~90MByte/sec ceph: total peak: 338MByte/sec write, 310MByte/sec read 1 VM 4GB RAM, BTRFS, cd /root; bonnie++ -u root Average for VM: block write: ~114 MByte/sec, block read: ~108MByte/sec ceph: total peak: 120 MByte/sec write, 111MByte/sec read Using dd: 10 VMs (two on each of the 5 hosts) VMs: 4GB RAM, write on a ceph based vm-disk "sdb" (rbd) write: dd if=/dev/zero of=/dev/sdb bs=nnn count=kkk conv=fsync status=progress read: dd of=/dev/null if=/dev/sdb bs=nnn count=kkk status=progress Average for each VM: bs=1024k count=12000: dd write: ~58MByte/sec, dd read: ~48MByte/sec bs=4096k count=3000: dd write: ~59MByte/sec, dd read: ~55MByte/sec ceph: total peak: 711MByte/sec write, 698 MByte/sec read 5 VMs (two on each of the 5 hosts) VMs: 4GB RAM, write on a ceph based vm-disk "sdb" (rbd) write: dd if=/dev/zero of=/dev/sdb bs=4096k count=3000 conv=fsync status=progress read: dd of=/dev/null if=/dev/sdb bs=4096k count=3000 status=progress Average for each VM: bs=4096 count=3000: dd write: ~80 MByte/sec, dd read: ~47MByte/sec ceph: total peak: 421MByte/sec write, 248 MByte/sec read 1 VM: 4GB RAM, write on a ceph based vm-disk "sdb" (rbd-device) write: dd if=/dev/zero of=/dev/sdb bs=4096k count=3000 conv=fsync status=progress read: dd of=/dev/null if=/dev/sdb bs=4096k count=3000 status=progress Average for each VM: bs=4096k count=3000: dd write: ~145 MByte/sec, dd read: ~130 MByte/sec ceph: total peak: 165 MByte/sec write, 337 MByte/sec read -- Zuzendari Teknikoa / Director Técnico Binovo IT Human Project, S.L. Telf. 943569206 Astigarragako bidea 2, 2º izq. oficina 11; 20180 Oiartzun (Gipuzkoa) www.binovo.es ___ pve-user mailing list pve-user@pve.proxmox.com https://pve.proxmox.co