Re: [PVE-User] Create secondary pool on ceph servers..
Oh! Sorry Alwin. I have some urgence to do this. So this what I do... First, I insert all HDDs, both SAS and SSD, into OSD tree. Than, I check if the system could detect SSD as ssd and SAS as hdd, but there's not difference! It's show all HDDs as hdd! So, I change the class with this commands: ceph osd crush rm-device-class osd.7 ceph osd crush set-device-class ssd osd.7 ceph osd crush rm-device-class osd.8 ceph osd crush set-device-class ssd osd.8 ceph osd crush rm-device-class osd.12 ceph osd crush set-device-class ssd osd.12 ceph osd crush rm-device-class osd.13 ceph osd crush set-device-class ssd osd.13 ceph osd crush rm-device-class osd.14 ceph osd crush set-device-class ssd osd.14 After that, ceph osd crush tree --show-shadow show me different types of HDD... ceph osd crush tree --show-shadow ID CLASS WEIGHT TYPE NAME -24 ssd 4.36394 root default~ssd -20 ssd0 host pve1~ssd -21 ssd0 host pve2~ssd -17 ssd 0.87279 host pve3~ssd 7 ssd 0.87279 osd.7 -18 ssd 0.87279 host pve4~ssd 8 ssd 0.87279 osd.8 -19 ssd 0.87279 host pve5~ssd 12 ssd 0.87279 osd.12 -22 ssd 0.87279 host pve6~ssd 13 ssd 0.87279 osd.13 -23 ssd 0.87279 host pve7~ssd 14 ssd 0.87279 osd.14 -2 hdd 12.00282 root default~hdd -10 hdd 1.09129 host pve1~hdd 0 hdd 1.09129 osd.0 . . Then, I have created the rule ceph osd crush rule create-replicated SSDPOOL default host ssd Then create a POOL named SSDs and then assigned the new pool ceph osd pool set SSDs crush_rule SSDPOOL It's seems to work properly... What you thing? --- Gilberto Nunes Ferreira Em ter., 14 de abr. de 2020 às 15:30, Alwin Antreich escreveu: > On Tue, Apr 14, 2020 at 02:35:55PM -0300, Gilberto Nunes wrote: > > Hi there > > > > I have 7 servers with PVE 6 all updated... > > All servers has named pve1,pve2 and so on... > > On pve3, pve4 and pve5 has SSD HD of 960GB. > > So we decided to create a second pool that will use only this SSD. > > I have readed Ceph CRUSH & device classes in order to do that! > > So just to do things right, I need check that out: > > 1 - first create OSD's with all HD, SAS and SSD > > 2 - second create different pool with command bellow: > > ruleset: > > > > ceph osd crush rule create-replicated > > > > > > create pool > > > > ceph osd pool set crush_rule > > > > > > Well, my question is: can I create OSD with all disk either SAS and > > SSD, and then after that, create the ruleset and the pool? > > Is this generated some impact during this operations?? > If your OSD types aren't mixed, then best create the rule for the > existing pool first. All data will move, once the rule is applied. So, > not much to movement if they are already on the correct OSD type. > > -- > Cheers, > Alwin > > ___ > pve-user mailing list > pve-user@pve.proxmox.com > https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user > ___ pve-user mailing list pve-user@pve.proxmox.com https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
Re: [PVE-User] Create secondary pool on ceph servers..
On Tue, Apr 14, 2020 at 02:35:55PM -0300, Gilberto Nunes wrote: > Hi there > > I have 7 servers with PVE 6 all updated... > All servers has named pve1,pve2 and so on... > On pve3, pve4 and pve5 has SSD HD of 960GB. > So we decided to create a second pool that will use only this SSD. > I have readed Ceph CRUSH & device classes in order to do that! > So just to do things right, I need check that out: > 1 - first create OSD's with all HD, SAS and SSD > 2 - second create different pool with command bellow: > ruleset: > > ceph osd crush rule create-replicated > > > create pool > > ceph osd pool set crush_rule > > > Well, my question is: can I create OSD with all disk either SAS and > SSD, and then after that, create the ruleset and the pool? > Is this generated some impact during this operations?? If your OSD types aren't mixed, then best create the rule for the existing pool first. All data will move, once the rule is applied. So, not much to movement if they are already on the correct OSD type. -- Cheers, Alwin ___ pve-user mailing list pve-user@pve.proxmox.com https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
[PVE-User] Create secondary pool on ceph servers..
Hi there I have 7 servers with PVE 6 all updated... All servers has named pve1,pve2 and so on... On pve3, pve4 and pve5 has SSD HD of 960GB. So we decided to create a second pool that will use only this SSD. I have readed Ceph CRUSH & device classes in order to do that! So just to do things right, I need check that out: 1 - first create OSD's with all HD, SAS and SSD 2 - second create different pool with command bellow: ruleset: ceph osd crush rule create-replicated create pool ceph osd pool set crush_rule Well, my question is: can I create OSD with all disk either SAS and SSD, and then after that, create the ruleset and the pool? Is this generated some impact during this operations?? Thanks a lot Gilberto ___ pve-user mailing list pve-user@pve.proxmox.com https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
Re: [PVE-User] Proxmox with ceph storage VM performance strangeness
On Tue, Apr 14, 2020 at 05:21:44PM +0200, Rainer Krienke wrote: > Am 14.04.20 um 16:42 schrieb Alwin Antreich: > >> According to these numbers the relation from write and read performance > >> should be the other way round: writes should be slower than reads, but > >> on a VM its exactly the other way round? > > > Ceph does reads in parallel, while writes are done to the primary OSD by > > the client. And that OSD is responsible for distributing the other > > copies. > > Ah yes right. The primary OSD has to wait until all the OSDs in the pg > have confirmed that data has been written to each of the OSDs. Reads as > you said are parallel so I would expect reading to be faster than > writing, but for me it is *not* in a proxmox VM with ceph rbd storage. > > However reads are faster on a ceph level, in a rados bench directly on a > pxa host (no VM) which is what I would expect also for reads/writes > inside a VM. > > > >> > >> Any idea why nevertheless writes on a VM are ~3 times faster then reads > >> and what I could try to speed up reading? > > > What is the byte size of bonnie++? If it uses 4 KB and data isn't in the > > cache, whole objects need to be requested from the cluster. > > I did not find information about blocksizes used. The whole file that is > written and later on read again by bonnie++ is however by default at > least twice the size of your RAM. According to the man page the chunk size is 8192 bytes by default. > > In a VM I also tried to read its own striped LV device: dd > if=/dev/vg/testlv of=/dev/null bs=1024k status=progress (after clearing > the VMs cache). /dev/vg/testlv is a striped LV (on 4 disks) with xfs on > it on which I tested the speed using bonnie++ before. > This dd also did not go beyond about 100MB/sec, whereas the rados bench > promises much more. Do you have a VM without stripped volumes? I suppose there will be two requests, for each half of the data. That could slow down the read as well. And you can disable the cache to verify that cache misses don't impact the performance. -- Cheers, Alwin ___ pve-user mailing list pve-user@pve.proxmox.com https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
Re: [PVE-User] Proxmox with ceph storage VM performance strangeness
Am 14.04.20 um 16:42 schrieb Alwin Antreich: >> According to these numbers the relation from write and read performance >> should be the other way round: writes should be slower than reads, but >> on a VM its exactly the other way round? > Ceph does reads in parallel, while writes are done to the primary OSD by > the client. And that OSD is responsible for distributing the other > copies. Ah yes right. The primary OSD has to wait until all the OSDs in the pg have confirmed that data has been written to each of the OSDs. Reads as you said are parallel so I would expect reading to be faster than writing, but for me it is *not* in a proxmox VM with ceph rbd storage. However reads are faster on a ceph level, in a rados bench directly on a pxa host (no VM) which is what I would expect also for reads/writes inside a VM. > >> >> Any idea why nevertheless writes on a VM are ~3 times faster then reads >> and what I could try to speed up reading? > What is the byte size of bonnie++? If it uses 4 KB and data isn't in the > cache, whole objects need to be requested from the cluster. I did not find information about blocksizes used. The whole file that is written and later on read again by bonnie++ is however by default at least twice the size of your RAM. In a VM I also tried to read its own striped LV device: dd if=/dev/vg/testlv of=/dev/null bs=1024k status=progress (after clearing the VMs cache). /dev/vg/testlv is a striped LV (on 4 disks) with xfs on it on which I tested the speed using bonnie++ before. This dd also did not go beyond about 100MB/sec, whereas the rados bench promises much more. Thanks Rainer -- Rainer Krienke, Uni Koblenz, Rechenzentrum, A22, Universitaetsstrasse 1 56070 Koblenz, Web: http://www.uni-koblenz.de/~krienke, Tel: +49261287 1312 PGP: http://www.uni-koblenz.de/~krienke/mypgp.html, Fax: +49261287 1001312 ___ pve-user mailing list pve-user@pve.proxmox.com https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
Re: [PVE-User] Proxmox with ceph storage VM performance strangeness
On Tue, Apr 14, 2020 at 03:54:30PM +0200, Rainer Krienke wrote: > Hello, > > in between I learned a lot from this group (thanks a lot) to solve many > performance problems I initially faced with proxmox in VMs having their > storage on CEPH rbds. > > I parallelized access to many disks on a vm where possible, used > iothreads and activated writeback cache. > > Running a bonnie++ I am now able to get about 300Mbytes/sec block write > performance, which is a great value because it even scales out with ceph > if I run the same bonnie++ on eg two machines. In this case I get > 600MBytes/sec. Great. > > The last strangeness I am experiencing is read performance. The same > bonnie on a VMs xfs_filesystem that yields 300MB write performance only > gets a block read of about 90MB/sec. > > So on one of the pxa-hosts and later also on one of the ceph cluster > nodes (nautilus 14.2.8, 144OSDs) I ran a rados bench test to see if ceph > is slowing down reads. The results on both systems were very similar. So > here is the test result from the pxa-host: > > # rados bench -p my-rbd 60 write --no-cleanup > Total time run: 60.284332 > Total writes made: 5376 > Write size: 4194304 > Object size:4194304 > Bandwidth (MB/sec): 356.71 > Stddev Bandwidth: 46.8361 > Max bandwidth (MB/sec): 424 > Min bandwidth (MB/sec): 160 > Average IOPS: 89 > Stddev IOPS:11 > Max IOPS: 106 > Min IOPS: 40 > Average Latency(s): 0.179274 > Stddev Latency(s): 0.105626 > Max latency(s): 1.00746 > Min latency(s): 0.0656261 > > # echo 3 > /proc/sys/vm/drop_caches > # rados bench -p pxa-rbd 60 seq > Total time run: 24.208097 > Total reads made: 5376 > Read size:4194304 > Object size: 4194304 > Bandwidth (MB/sec): 888.298 > Average IOPS: 222 > Stddev IOPS: 33 > Max IOPS: 249 > Min IOPS: 92 > Average Latency(s): 0.0714553 > Max latency(s): 0.63154 > Min latency(s): 0.0237746 > > > According to these numbers the relation from write and read performance > should be the other way round: writes should be slower than reads, but > on a VM its exactly the other way round? Ceph does reads in parallel, while writes are done to the primary OSD by the client. And that OSD is responsible for distributing the other copies. > > Any idea why nevertheless writes on a VM are ~3 times faster then reads > and what I could try to speed up reading? What is the byte size of bonnie++? If it uses 4 KB and data isn't in the cache, whole objects need to be requested from the cluster. -- Cheers, Alwin ___ pve-user mailing list pve-user@pve.proxmox.com https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
Re: [PVE-User] Proxmox with ceph storage VM performance strangeness
Hello, in between I learned a lot from this group (thanks a lot) to solve many performance problems I initially faced with proxmox in VMs having their storage on CEPH rbds. I parallelized access to many disks on a vm where possible, used iothreads and activated writeback cache. Running a bonnie++ I am now able to get about 300Mbytes/sec block write performance, which is a great value because it even scales out with ceph if I run the same bonnie++ on eg two machines. In this case I get 600MBytes/sec. Great. The last strangeness I am experiencing is read performance. The same bonnie on a VMs xfs_filesystem that yields 300MB write performance only gets a block read of about 90MB/sec. So on one of the pxa-hosts and later also on one of the ceph cluster nodes (nautilus 14.2.8, 144OSDs) I ran a rados bench test to see if ceph is slowing down reads. The results on both systems were very similar. So here is the test result from the pxa-host: # rados bench -p my-rbd 60 write --no-cleanup Total time run: 60.284332 Total writes made: 5376 Write size: 4194304 Object size:4194304 Bandwidth (MB/sec): 356.71 Stddev Bandwidth: 46.8361 Max bandwidth (MB/sec): 424 Min bandwidth (MB/sec): 160 Average IOPS: 89 Stddev IOPS:11 Max IOPS: 106 Min IOPS: 40 Average Latency(s): 0.179274 Stddev Latency(s): 0.105626 Max latency(s): 1.00746 Min latency(s): 0.0656261 # echo 3 > /proc/sys/vm/drop_caches # rados bench -p pxa-rbd 60 seq Total time run: 24.208097 Total reads made: 5376 Read size:4194304 Object size: 4194304 Bandwidth (MB/sec): 888.298 Average IOPS: 222 Stddev IOPS: 33 Max IOPS: 249 Min IOPS: 92 Average Latency(s): 0.0714553 Max latency(s): 0.63154 Min latency(s): 0.0237746 According to these numbers the relation from write and read performance should be the other way round: writes should be slower than reads, but on a VM its exactly the other way round? Any idea why nevertheless writes on a VM are ~3 times faster then reads and what I could try to speed up reading? Thanks a lot Rainer -- Rainer Krienke, Uni Koblenz, Rechenzentrum, A22, Universitaetsstrasse 1 56070 Koblenz, Web: http://www.uni-koblenz.de/~krienke, Tel: +49261287 1312 PGP: http://www.uni-koblenz.de/~krienke/mypgp.html, Fax: +49261287 1001312 -- Rainer Krienke, Uni Koblenz, Rechenzentrum, A22, Universitaetsstrasse 1 56070 Koblenz, Web: http://www.uni-koblenz.de/~krienke, Tel: +49261287 1312 PGP: http://www.uni-koblenz.de/~krienke/mypgp.html, Fax: +49261287 1001312 ___ pve-user mailing list pve-user@pve.proxmox.com https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user