Re: [PVE-User] Create secondary pool on ceph servers..

2020-04-14 Thread Gilberto Nunes
Oh! Sorry Alwin.
I have some urgence to do this.
So this what I do...
First, I insert all HDDs, both SAS and SSD, into OSD tree.
Than, I check if the system could detect SSD as ssd and SAS as hdd, but
there's not difference! It's show all HDDs as hdd!
So, I change the class with this commands:
ceph osd crush rm-device-class osd.7
ceph osd crush set-device-class ssd osd.7
ceph osd crush rm-device-class osd.8
ceph osd crush set-device-class ssd osd.8
ceph osd crush rm-device-class osd.12
ceph osd crush set-device-class ssd osd.12
ceph osd crush rm-device-class osd.13
ceph osd crush set-device-class ssd osd.13
ceph osd crush rm-device-class osd.14
ceph osd crush set-device-class ssd osd.14

After that, ceph osd crush tree --show-shadow show me different types of
HDD...
ceph osd crush tree --show-shadow
ID  CLASS WEIGHT   TYPE NAME
-24   ssd  4.36394 root default~ssd
-20   ssd0 host pve1~ssd
-21   ssd0 host pve2~ssd
-17   ssd  0.87279 host pve3~ssd
 7   ssd  0.87279 osd.7
-18   ssd  0.87279 host pve4~ssd
 8   ssd  0.87279 osd.8
-19   ssd  0.87279 host pve5~ssd
12   ssd  0.87279 osd.12
-22   ssd  0.87279 host pve6~ssd
13   ssd  0.87279 osd.13
-23   ssd  0.87279 host pve7~ssd
14   ssd  0.87279 osd.14
-2   hdd 12.00282 root default~hdd
-10   hdd  1.09129 host pve1~hdd
 0   hdd  1.09129 osd.0
.
.

Then, I have created the rule

ceph osd crush rule create-replicated SSDPOOL default host ssd
Then create a POOL named SSDs

and then assigned the new pool
ceph osd pool set SSDs crush_rule SSDPOOL

It's seems to work properly...

What you thing?






---
Gilberto Nunes Ferreira


Em ter., 14 de abr. de 2020 às 15:30, Alwin Antreich 
escreveu:

> On Tue, Apr 14, 2020 at 02:35:55PM -0300, Gilberto Nunes wrote:
> > Hi there
> >
> > I have 7 servers with PVE 6 all updated...
> > All servers has named pve1,pve2 and so on...
> > On pve3, pve4 and pve5 has SSD HD of 960GB.
> > So we decided to create a second pool that will use only this SSD.
> > I have readed Ceph CRUSH & device classes in order to do that!
> > So just to do things right, I need check that out:
> > 1 - first create OSD's with all HD, SAS and SSD
> > 2 - second create different pool with command bellow:
> > ruleset:
> >
> > ceph osd crush rule create-replicated  
> >  
> >
> > create pool
> >
> > ceph osd pool set  crush_rule 
> >
> >
> > Well, my question is: can I create OSD with all disk either SAS and
> > SSD, and then after that, create the ruleset and the pool?
> > Is this generated some impact during this operations??
> If your OSD types aren't mixed, then best create the rule for the
> existing pool first. All data will move, once the rule is applied. So,
> not much to movement if they are already on the correct OSD type.
>
> --
> Cheers,
> Alwin
>
> ___
> pve-user mailing list
> pve-user@pve.proxmox.com
> https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
>
___
pve-user mailing list
pve-user@pve.proxmox.com
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user


Re: [PVE-User] Create secondary pool on ceph servers..

2020-04-14 Thread Alwin Antreich
On Tue, Apr 14, 2020 at 02:35:55PM -0300, Gilberto Nunes wrote:
> Hi there
> 
> I have 7 servers with PVE 6 all updated...
> All servers has named pve1,pve2 and so on...
> On pve3, pve4 and pve5 has SSD HD of 960GB.
> So we decided to create a second pool that will use only this SSD.
> I have readed Ceph CRUSH & device classes in order to do that!
> So just to do things right, I need check that out:
> 1 - first create OSD's with all HD, SAS and SSD
> 2 - second create different pool with command bellow:
> ruleset:
> 
> ceph osd crush rule create-replicated  
>  
> 
> create pool
> 
> ceph osd pool set  crush_rule 
> 
> 
> Well, my question is: can I create OSD with all disk either SAS and
> SSD, and then after that, create the ruleset and the pool?
> Is this generated some impact during this operations??
If your OSD types aren't mixed, then best create the rule for the
existing pool first. All data will move, once the rule is applied. So,
not much to movement if they are already on the correct OSD type.

--
Cheers,
Alwin

___
pve-user mailing list
pve-user@pve.proxmox.com
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user


[PVE-User] Create secondary pool on ceph servers..

2020-04-14 Thread Gilberto Nunes
Hi there

I have 7 servers with PVE 6 all updated...
All servers has named pve1,pve2 and so on...
On pve3, pve4 and pve5 has SSD HD of 960GB.
So we decided to create a second pool that will use only this SSD.
I have readed Ceph CRUSH & device classes in order to do that!
So just to do things right, I need check that out:
1 - first create OSD's with all HD, SAS and SSD
2 - second create different pool with command bellow:
ruleset:

ceph osd crush rule create-replicated  
 

create pool

ceph osd pool set  crush_rule 


Well, my question is: can I create OSD with all disk either SAS and
SSD, and then after that, create the ruleset and the pool?
Is this generated some impact during this operations??


Thanks a lot



Gilberto
___
pve-user mailing list
pve-user@pve.proxmox.com
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user


Re: [PVE-User] Proxmox with ceph storage VM performance strangeness

2020-04-14 Thread Alwin Antreich
On Tue, Apr 14, 2020 at 05:21:44PM +0200, Rainer Krienke wrote:
> Am 14.04.20 um 16:42 schrieb Alwin Antreich:
> >> According to these numbers the relation from write and read  performance
> >> should be the other way round: writes should be slower than reads, but
> >> on a VM its exactly the other way round?
> 
> > Ceph does reads in parallel, while writes are done to the primary OSD by
> > the client. And that OSD is responsible for distributing the other
> > copies.
> 
> Ah yes right. The primary OSD has to wait until all the OSDs in the pg
> have confirmed that data has been written to each of the OSDs. Reads as
> you said are parallel so I would expect reading to be faster than
> writing, but for me it is *not* in a proxmox VM with ceph rbd storage.
> 
> However reads are faster on a ceph level, in a rados bench directly on a
> pxa host (no VM) which is what I would expect also for reads/writes
> inside a VM.
> > 
> >>
> >> Any idea why nevertheless writes on a VM are ~3 times faster then reads
> >> and what I could try to speed up reading?
> 
> > What is the byte size of bonnie++? If it uses 4 KB and data isn't in the
> > cache, whole objects need to be requested from the cluster.
> 
> I did not find information about blocksizes used. The whole file that is
> written and later on read again by bonnie++ is however by default at
> least twice the size of your RAM.
According to the man page the chunk size is 8192 bytes by default.

> 
> In a VM I also tried to read its own striped LV device: dd
> if=/dev/vg/testlv  of=/dev/null bs=1024k status=progress (after clearing
> the VMs cache). /dev/vg/testlv is a striped LV (on 4 disks) with xfs on
> it  on which I tested the speed using bonnie++ before.
> This dd also did not go beyond about 100MB/sec, whereas the rados bench
> promises much more.
Do you have a VM without stripped volumes? I suppose there will be two
requests, for each half of the data. That could slow down the read as
well.

And you can disable the cache to verify that cache misses don't impact
the performance.

--
Cheers,
Alwin

___
pve-user mailing list
pve-user@pve.proxmox.com
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user


Re: [PVE-User] Proxmox with ceph storage VM performance strangeness

2020-04-14 Thread Rainer Krienke
Am 14.04.20 um 16:42 schrieb Alwin Antreich:
>> According to these numbers the relation from write and read  performance
>> should be the other way round: writes should be slower than reads, but
>> on a VM its exactly the other way round?

> Ceph does reads in parallel, while writes are done to the primary OSD by
> the client. And that OSD is responsible for distributing the other
> copies.

Ah yes right. The primary OSD has to wait until all the OSDs in the pg
have confirmed that data has been written to each of the OSDs. Reads as
you said are parallel so I would expect reading to be faster than
writing, but for me it is *not* in a proxmox VM with ceph rbd storage.

However reads are faster on a ceph level, in a rados bench directly on a
pxa host (no VM) which is what I would expect also for reads/writes
inside a VM.
> 
>>
>> Any idea why nevertheless writes on a VM are ~3 times faster then reads
>> and what I could try to speed up reading?

> What is the byte size of bonnie++? If it uses 4 KB and data isn't in the
> cache, whole objects need to be requested from the cluster.

I did not find information about blocksizes used. The whole file that is
written and later on read again by bonnie++ is however by default at
least twice the size of your RAM.

In a VM I also tried to read its own striped LV device: dd
if=/dev/vg/testlv  of=/dev/null bs=1024k status=progress (after clearing
the VMs cache). /dev/vg/testlv is a striped LV (on 4 disks) with xfs on
it  on which I tested the speed using bonnie++ before.
This dd also did not go beyond about 100MB/sec, whereas the rados bench
promises much more.

Thanks
Rainer

-- 
Rainer Krienke, Uni Koblenz, Rechenzentrum, A22, Universitaetsstrasse  1
56070 Koblenz, Web: http://www.uni-koblenz.de/~krienke, Tel: +49261287 1312
PGP: http://www.uni-koblenz.de/~krienke/mypgp.html, Fax: +49261287
1001312
___
pve-user mailing list
pve-user@pve.proxmox.com
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user


Re: [PVE-User] Proxmox with ceph storage VM performance strangeness

2020-04-14 Thread Alwin Antreich
On Tue, Apr 14, 2020 at 03:54:30PM +0200, Rainer Krienke wrote:
> Hello,
> 
> in between I learned a lot from this group (thanks a lot) to solve many
> performance problems I initially faced with proxmox in VMs having their
> storage on CEPH rbds.
> 
> I parallelized access to many disks on a vm where possible, used
> iothreads and activated writeback cache.
> 
> Running a bonnie++ I am now able to get about 300Mbytes/sec block write
> performance, which is a great value because it even scales out with ceph
> if I run the same bonnie++ on eg two machines. In this case I get
> 600MBytes/sec. Great.
> 
> The last strangeness I am experiencing is read performance. The same
> bonnie on a VMs xfs_filesystem that yields 300MB write performance only
> gets a block read of about 90MB/sec.
> 
> So on one of the pxa-hosts and later also on one of the ceph cluster
> nodes (nautilus 14.2.8, 144OSDs) I ran a rados bench test to see if ceph
> is slowing down reads. The results on both systems were very similar. So
> here is the test result from the pxa-host:
> 
> # rados  bench -p my-rbd 60 write --no-cleanup
> Total time run: 60.284332
> Total writes made:  5376
> Write size: 4194304
> Object size:4194304
> Bandwidth (MB/sec): 356.71
> Stddev Bandwidth:   46.8361
> Max bandwidth (MB/sec): 424
> Min bandwidth (MB/sec): 160
> Average IOPS:   89
> Stddev IOPS:11
> Max IOPS:   106
> Min IOPS:   40
> Average Latency(s): 0.179274
> Stddev Latency(s):  0.105626
> Max latency(s): 1.00746
> Min latency(s): 0.0656261
> 
> # echo 3 > /proc/sys/vm/drop_caches
> # rados  bench -p pxa-rbd 60 seq
> Total time run:   24.208097
> Total reads made: 5376
> Read size:4194304
> Object size:  4194304
> Bandwidth (MB/sec):   888.298
> Average IOPS: 222
> Stddev IOPS:  33
> Max IOPS: 249
> Min IOPS: 92
> Average Latency(s):   0.0714553
> Max latency(s):   0.63154
> Min latency(s):   0.0237746
> 
> 
> According to these numbers the relation from write and read  performance
> should be the other way round: writes should be slower than reads, but
> on a VM its exactly the other way round?
Ceph does reads in parallel, while writes are done to the primary OSD by
the client. And that OSD is responsible for distributing the other
copies.

> 
> Any idea why nevertheless writes on a VM are ~3 times faster then reads
> and what I could try to speed up reading?
What is the byte size of bonnie++? If it uses 4 KB and data isn't in the
cache, whole objects need to be requested from the cluster.

--
Cheers,
Alwin

___
pve-user mailing list
pve-user@pve.proxmox.com
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user


Re: [PVE-User] Proxmox with ceph storage VM performance strangeness

2020-04-14 Thread Rainer Krienke
Hello,

in between I learned a lot from this group (thanks a lot) to solve many
performance problems I initially faced with proxmox in VMs having their
storage on CEPH rbds.

I parallelized access to many disks on a vm where possible, used
iothreads and activated writeback cache.

Running a bonnie++ I am now able to get about 300Mbytes/sec block write
performance, which is a great value because it even scales out with ceph
if I run the same bonnie++ on eg two machines. In this case I get
600MBytes/sec. Great.

The last strangeness I am experiencing is read performance. The same
bonnie on a VMs xfs_filesystem that yields 300MB write performance only
gets a block read of about 90MB/sec.

So on one of the pxa-hosts and later also on one of the ceph cluster
nodes (nautilus 14.2.8, 144OSDs) I ran a rados bench test to see if ceph
is slowing down reads. The results on both systems were very similar. So
here is the test result from the pxa-host:

# rados  bench -p my-rbd 60 write --no-cleanup
Total time run: 60.284332
Total writes made:  5376
Write size: 4194304
Object size:4194304
Bandwidth (MB/sec): 356.71
Stddev Bandwidth:   46.8361
Max bandwidth (MB/sec): 424
Min bandwidth (MB/sec): 160
Average IOPS:   89
Stddev IOPS:11
Max IOPS:   106
Min IOPS:   40
Average Latency(s): 0.179274
Stddev Latency(s):  0.105626
Max latency(s): 1.00746
Min latency(s): 0.0656261

# echo 3 > /proc/sys/vm/drop_caches
# rados  bench -p pxa-rbd 60 seq
Total time run:   24.208097
Total reads made: 5376
Read size:4194304
Object size:  4194304
Bandwidth (MB/sec):   888.298
Average IOPS: 222
Stddev IOPS:  33
Max IOPS: 249
Min IOPS: 92
Average Latency(s):   0.0714553
Max latency(s):   0.63154
Min latency(s):   0.0237746


According to these numbers the relation from write and read  performance
should be the other way round: writes should be slower than reads, but
on a VM its exactly the other way round?

Any idea why nevertheless writes on a VM are ~3 times faster then reads
and what I could try to speed up reading?

Thanks a lot
Rainer
-- 
Rainer Krienke, Uni Koblenz, Rechenzentrum, A22, Universitaetsstrasse  1
56070 Koblenz, Web: http://www.uni-koblenz.de/~krienke, Tel: +49261287 1312
PGP: http://www.uni-koblenz.de/~krienke/mypgp.html, Fax: +49261287
1001312



-- 
Rainer Krienke, Uni Koblenz, Rechenzentrum, A22, Universitaetsstrasse  1
56070 Koblenz, Web: http://www.uni-koblenz.de/~krienke, Tel: +49261287 1312
PGP: http://www.uni-koblenz.de/~krienke/mypgp.html, Fax: +49261287
1001312
___
pve-user mailing list
pve-user@pve.proxmox.com
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user