Re: [ceph-users] Latency for the Public Network

2018-02-06 Thread Christian Balzer

Hello,

On Tue, 6 Feb 2018 09:21:22 +0100 Tobias Kropf wrote:

> On 02/06/2018 04:03 AM, Christian Balzer wrote:
> > Hello,
> >
> > On Mon, 5 Feb 2018 22:04:00 +0100 Tobias Kropf wrote:
> >  
> >> Hi ceph list,
> >>
> >> we have a hyperconvergent ceph cluster with kvm on 8 nodes with ceph
> >> hammer 0.94.10.   
> > Do I smell Proxmox?  
> Yes we use atm Proxmox
> >  
> >> The cluster is now 3 years old an we plan with a new
> >> cluster for a high iops project. We use replicated pools 3/2 and have
> >> not the best latency on our switch backend.
> >>
> >>
> >> ping -s 8192 10.10.10.40 
> >>
> >> 8200 bytes from 10.10.10.40: icmp_seq=1 ttl=64 time=0.153 ms
> >>  
> > Not particularly great, yes.
> > However your network latency is only one factor, Ceph OSDs add quite
> > another layer there and do affect IOPS even more usually. 
> > For high IOPS you need of course fast storage, network AND CPUs.   
> Yes we know that... the network is our first job. We plan with new
> hardware for mon and osd services with a lot of flash nvme disks and
> high ghz cpus.
> >  
> >> We plan to split the hyperconvergent setup to storage an compute nodes
> >> and want to split ceph cluster and public network. Cluster network with
> >> 40 gbit mellanox switches and public network with the existant 10gbit
> >> switches.
> >>  
> > You'd do a lot better if you were to go all 40Gb/s and forget about
> > splitting networks.   
> Use public and cluster network over the same nics and the same subnet?

Yes, at least for NICs. 
If for some reason your compute nodes have no dedicated links/NICs for the
Ceph cluster and it makes you feel warm and fuzzy, you can segregate
traffic with VLANs. 
But it most cases that really comes down to "security theater", if a
compute gets compromised they have access to your ceph cluster network
anyway.

When looking at the ML archives you'll find a number of people suggesting
to keep things simple if not otherwise needed. 

> >
> > The faster replication network will:
> > a) be underutilized all of the time in terms of bandwidth 
> > b) not help with read IOPS at all
> > c) still be hobbled by the public network latency when it comes to write
> > IOPS (but of course help in regards to replication latency). 
> >  
> >> Now my question... are 0.153ms - 0.170ms fast enough for the public
> >> network? We must deploy a setup with 1500 - 2000 terminalserver
> >>  
> > Define terminal server, are we talking Windows Virtual Desktops with RDP?
> > Windows is quite the hog when it comes to I/O.  
> Yes we talking about windows virtual desktops with rdp
> Our calculation is... 1x dc= 60-80 IOPS 1x ts = 60-80 IOPS N User * 10
> IOPS ...
> 
> For this system we want to wort with cache tiering in front with nvme
> disk and sata disk on ec pool.  Is this a good idear to use Cache
> tiering in this setup?
> 
Depends on the size of your cache-tier really.
I have done no analysis of Windows I/O behavior other than it being
insanely swap happy w/o needs, so if you can, eliminate the pagefile. 

If all your typical writes can be satisfied from the cache-tier, good.
Reads (like OS boot, etc) should be fine from the EC pool, so cache-tier
in read-forward mode. 

But you _really_ need to test this, a non-fitting cache-tier can be worse
than no cache at all.

Christian

> 
> >
> > Regards,
> >
> > Christian  
> 


-- 
Christian BalzerNetwork/Systems Engineer
ch...@gol.com   Rakuten Communications
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Latency for the Public Network

2018-02-06 Thread Tobias Kropf


On 02/06/2018 04:03 AM, Christian Balzer wrote:
> Hello,
>
> On Mon, 5 Feb 2018 22:04:00 +0100 Tobias Kropf wrote:
>
>> Hi ceph list,
>>
>> we have a hyperconvergent ceph cluster with kvm on 8 nodes with ceph
>> hammer 0.94.10. 
> Do I smell Proxmox?
Yes we use atm Proxmox
>
>> The cluster is now 3 years old an we plan with a new
>> cluster for a high iops project. We use replicated pools 3/2 and have
>> not the best latency on our switch backend.
>>
>>
>> ping -s 8192 10.10.10.40 
>>
>> 8200 bytes from 10.10.10.40: icmp_seq=1 ttl=64 time=0.153 ms
>>
> Not particularly great, yes.
> However your network latency is only one factor, Ceph OSDs add quite
> another layer there and do affect IOPS even more usually. 
> For high IOPS you need of course fast storage, network AND CPUs. 
Yes we know that... the network is our first job. We plan with new
hardware for mon and osd services with a lot of flash nvme disks and
high ghz cpus.
>
>> We plan to split the hyperconvergent setup to storage an compute nodes
>> and want to split ceph cluster and public network. Cluster network with
>> 40 gbit mellanox switches and public network with the existant 10gbit
>> switches.
>>
> You'd do a lot better if you were to go all 40Gb/s and forget about
> splitting networks. 
Use public and cluster network over the same nics and the same subnet?
>
> The faster replication network will:
> a) be underutilized all of the time in terms of bandwidth 
> b) not help with read IOPS at all
> c) still be hobbled by the public network latency when it comes to write
> IOPS (but of course help in regards to replication latency). 
>
>> Now my question... are 0.153ms - 0.170ms fast enough for the public
>> network? We must deploy a setup with 1500 - 2000 terminalserver
>>
> Define terminal server, are we talking Windows Virtual Desktops with RDP?
> Windows is quite the hog when it comes to I/O.
Yes we talking about windows virtual desktops with rdp
Our calculation is... 1x dc= 60-80 IOPS 1x ts = 60-80 IOPS N User * 10
IOPS ...

For this system we want to wort with cache tiering in front with nvme
disk and sata disk on ec pool.  Is this a good idear to use Cache
tiering in this setup?


>
> Regards,
>
> Christian

-- 
Tobias Kropf

 

Technik

 

 

--

inett5-100x56

inett GmbH » Ihr IT Systemhaus in Saarbrücken

Mainzerstrasse 183
66121 Saarbrücken
Geschäftsführer: Marco Gabriel
Handelsregister Saarbrücken
HRB 16588


Telefon: 0681 / 41 09 93 – 0
Telefax: 0681 / 41 09 93 – 99
E-Mail: i...@inett.de
Web: www.inett.de

Cyberoam Gold Partner - Zarafa Gold Partner - Proxmox Authorized Reseller - 
Proxmox Training Center - SEP sesam Certified Partner – Open-E Partner - Endian 
Certified Partner - Kaspersky Silver Partner – ESET Silver Partner - Mitglied 
im iTeam Systemhausverbund für den Mittelstand 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Latency for the Public Network

2018-02-05 Thread Christian Balzer

Hello,

On Mon, 5 Feb 2018 22:04:00 +0100 Tobias Kropf wrote:

> Hi ceph list,
> 
> we have a hyperconvergent ceph cluster with kvm on 8 nodes with ceph
> hammer 0.94.10. 
Do I smell Proxmox?

> The cluster is now 3 years old an we plan with a new
> cluster for a high iops project. We use replicated pools 3/2 and have
> not the best latency on our switch backend.
> 
> 
> ping -s 8192 10.10.10.40 
> 
> 8200 bytes from 10.10.10.40: icmp_seq=1 ttl=64 time=0.153 ms
> 
Not particularly great, yes.
However your network latency is only one factor, Ceph OSDs add quite
another layer there and do affect IOPS even more usually. 
For high IOPS you need of course fast storage, network AND CPUs. 

> 
> We plan to split the hyperconvergent setup to storage an compute nodes
> and want to split ceph cluster and public network. Cluster network with
> 40 gbit mellanox switches and public network with the existant 10gbit
> switches.
> 
You'd do a lot better if you were to go all 40Gb/s and forget about
splitting networks. 

The faster replication network will:
a) be underutilized all of the time in terms of bandwidth 
b) not help with read IOPS at all
c) still be hobbled by the public network latency when it comes to write
IOPS (but of course help in regards to replication latency). 

> Now my question... are 0.153ms - 0.170ms fast enough for the public
> network? We must deploy a setup with 1500 - 2000 terminalserver
>
Define terminal server, are we talking Windows Virtual Desktops with RDP?
Windows is quite the hog when it comes to I/O.

Regards,

Christian
-- 
Christian BalzerNetwork/Systems Engineer
ch...@gol.com   Rakuten Communications
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Latency for the Public Network

2018-02-05 Thread Tobias Kropf
Hi ceph list,

we have a hyperconvergent ceph cluster with kvm on 8 nodes with ceph
hammer 0.94.10. The cluster is now 3 years old an we plan with a new
cluster for a high iops project. We use replicated pools 3/2 and have
not the best latency on our switch backend.


ping -s 8192 10.10.10.40

8200 bytes from 10.10.10.40: icmp_seq=1 ttl=64 time=0.153 ms


We plan to split the hyperconvergent setup to storage an compute nodes
and want to split ceph cluster and public network. Cluster network with
40 gbit mellanox switches and public network with the existant 10gbit
switches.

Now my question... are 0.153ms - 0.170ms fast enough for the public
network? We must deploy a setup with 1500 - 2000 terminalserver


Has anyone some experience with a lot of terminalservers on a ceph backend?


Thanks for replys...


-- 
Tobias Kropf

 

Technik

 

 

--

inett5-100x56

inett GmbH » Ihr IT Systemhaus in Saarbrücken

Mainzerstrasse 183
66121 Saarbrücken
Geschäftsführer: Marco Gabriel
Handelsregister Saarbrücken
HRB 16588


Telefon: 0681 / 41 09 93 – 0
Telefax: 0681 / 41 09 93 – 99
E-Mail: i...@inett.de
Web: www.inett.de

Cyberoam Gold Partner - Zarafa Gold Partner - Proxmox Authorized Reseller - 
Proxmox Training Center - SEP sesam Certified Partner – Open-E Partner - Endian 
Certified Partner - Kaspersky Silver Partner – ESET Silver Partner - Mitglied 
im iTeam Systemhausverbund für den Mittelstand 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com