[ceph-users] Re: Nic bonding (lacp) settings for ceph

2021-06-28 Thread Marc 'risson' Schmitt
On Mon, 28 Jun 2021 22:35:36 +0300
mhnx  wrote:
> To be clear.
> I have stacked switch and this is my configuration.
> 
> Bonding cluster: (hash 3+4)
> Cluster nic1(10Gbe) -> Switch A
> Cluster nic2(10Gbe) -> Switch B
> 
> Bonding public: (hash 3+4)
> Public  nic1(10Gbe) -> Switch A
> Public  nic2(10Gbe) -> Switch B
> 
> Data distribution wasn't good at the begining due to layer2 bonding.
> With the hash3+4 its better now.
> 
> But when I test the network with "iperf -parallel 2" and
> "ad_select=stable" Sometimes it uses both nic, sometimes it uses only
> one nic. After that i changed "ad_select=bandwitdh" and data
> distribution was looking better. Every iperf test was successfull and
> also when one port has some data going on, the next request always
> used the free port. And that's why I'm digging it. If it doesn't have
> any bad side or overhead then test winner is bandwitdh in my tests. I
> will share the test Results in my next mail. PS: How should I test
> latency?

iperf --parallel chooses random ports, so you would get random results
depending on what ports are selected as you enabled layer3+4.

If your switches are stacked and handle bonding across both of them,
which I'm guessing they do, you probably don't need ad_select=bandwidth
for the reasons explained by Andrew.

> I'm not network expert. I'm just trying to understand the concept. My
> switch is layer2+3 TOR switch. I use active-active standart
> port-channel settings. I Wonder that If i dont change switch side to
> 3+4, what will be the effect on the rest?
>  I think TX will share both nic but RX always will be use one nic due
> to switch hash algorithm is differ but its just a guess.

There shouldn't be any problem setting 3+4 on one side and 2 on the
other side, so you can change that setting on your switch without
having to worry about other bonds set up on it break.

-- 
Marc 'risson' Schmitt
CRI - EPITA
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Nic bonding (lacp) settings for ceph

2021-06-28 Thread mhnx
To be clear.
I have stacked switch and this is my configuration.

Bonding cluster: (hash 3+4)
Cluster nic1(10Gbe) -> Switch A
Cluster nic2(10Gbe) -> Switch B

Bonding public: (hash 3+4)
Public  nic1(10Gbe) -> Switch A
Public  nic2(10Gbe) -> Switch B

Data distribution wasn't good at the begining due to layer2 bonding. With
the hash3+4 its better now.

But when I test the network with "iperf -parallel 2" and "ad_select=stable"
Sometimes it uses both nic, sometimes it uses only one nic.
After that i changed "ad_select=bandwitdh" and data distribution was
looking better. Every iperf test was successfull and also when one port has
some data going on, the next request always used the free port.
And that's why I'm digging it. If it doesn't have any bad side or overhead
then test winner is bandwitdh in my tests. I will share the test Results in
my next mail. PS: How should I test latency?

I'm not network expert. I'm just trying to understand the concept. My
switch is layer2+3 TOR switch. I use active-active standart port-channel
settings. I Wonder that If i dont change switch side to 3+4, what will be
the effect on the rest?
 I think TX will share both nic but RX always will be use one nic due to
switch hash algorithm is differ but its just a guess.


28 Haz 2021 Pzt 21:38 tarihinde Andrew Walker-Brown <
andrew_jbr...@hotmail.com> şunu yazdı:

> HI,
>
>
>
> I think ad_select is only relevant in the scenario below I.e where you
> have more than one port-channel being presented to the Linux bond.  So
> below, you have 2 port channels, one from each switch, but at the Linux
> side all the ports involved are slaves in the same bond.  In your scenario
> it sounds like you just have one switch with one port-channel to one bond
> on Linux.  So in the case of ad_select, I doubt it has any impact.  The
> main thing will be the xmit-hash-policy on both the switches and Linux.
> FWIW, I use layer3+4 on Linux and something very close to that on my S
> series switches, and both 10G links get used pretty well.  (below was
> lifted from a stackexchange thread)
>
>
>
> .---.   .---.
>
> |  Switch1  |   |  Switch2  |
>
> '-=---=-'   '-=---=-'
>
>   |   |   |   |
>
>   |   |   |   |
>
> .-=.--=---.---=--.=-.
>
> | eth0 | eth1 | eth2 | eth3 |
>
> |---|
>
> |   bond0   |
>
> '---'
>
> Where each switch has its two ports configured in a PortChannel, the
> Linux end with the LACP bond will negotiate two Aggregator IDs:
>
> Aggregator ID 1
>
>  - eth0 and eth1
>
>
>
> Aggregator ID 2
>
>  - eth2 and eth3
>
> And the switches will have a view completely separate of each other.
>
> Switch 1 will think:
>
> Switch 1
>
>  PortChannel 1
>
>  - port X
>
>  - port Y
>
> Switch 2 will think:
>
> Switch 2
>
>  PortChannel 1
>
>  - port X
>
>  - port Y
>
> From the Linux system with the bond, only one Aggregator will be used at a
> given time, and will fail over depending on ad_select.
>
> So assuming Aggregator ID 1 is in use, and you pull eth0's cable out, the
> default behaviour is to stay on Aggregator ID 1.
>
> However, Aggregator ID 1 only has 1 cable, and there's a spare Aggregator
> ID 2 with 2 cables - twice the bandwidth!
>
> If you use ad_select=count or ad_select=bandwidth, the active Aggregator
> ID fails over to an Aggregator with the most cables or the most bandwidth.
>
> Note that LACP mandates an Aggregator's ports must all be the same speed
> and duplex, so I believe you could configure one Aggregator with 1Gbps
> ports and one Aggregator with 10Gbps ports, and have intelligent selection
> depending on whether you have 20/10/2/1Gbps available.
>
>
>
> Sent from Mail <https://go.microsoft.com/fwlink/?LinkId=550986> for
> Windows 10
>
>
>
> *From: *mhnx 
> *Sent: *28 June 2021 18:46
> *To: *Marc 'risson' Schmitt 
> *Cc: *Ceph Users 
> *Subject: *[ceph-users] Re: Nic bonding (lacp) settings for ceph
>
>
>
> Thanks for the answer.
> I'm into ad_select bandwitdh because we use osd nodes as rgw gateways, VMs
> and different applications.
>
> I have seperate cluster (10+10Gbe) and public (10+10Gbe) network.
> I tested stable, bandwitdh and count. Results are clearly good with
> bandwitdh. Count is the worst option.
> But I wonder if bandwitdh calculation has any effect on the network delay?
> If it is then I will return to stable. I don't know now but when i think
> about it if every time bonding driver needs to calculate bandwit

[ceph-users] Re: Nic bonding (lacp) settings for ceph

2021-06-28 Thread Andrew Walker-Brown
HI,

I think ad_select is only relevant in the scenario below I.e where you have 
more than one port-channel being presented to the Linux bond.  So below, you 
have 2 port channels, one from each switch, but at the Linux side all the ports 
involved are slaves in the same bond.  In your scenario it sounds like you just 
have one switch with one port-channel to one bond on Linux.  So in the case of 
ad_select, I doubt it has any impact.  The main thing will be the 
xmit-hash-policy on both the switches and Linux.  FWIW, I use layer3+4 on Linux 
and something very close to that on my S series switches, and both 10G links 
get used pretty well.  (below was lifted from a stackexchange thread)


.---.   .---.

|  Switch1  |   |  Switch2  |

'-=---=-'   '-=---=-'

  |   |   |   |

  |   |   |   |

.-=.--=---.---=--.=-.

| eth0 | eth1 | eth2 | eth3 |

|---|

|   bond0   |

'---'

Where each switch has its two ports configured in a PortChannel, the Linux end 
with the LACP bond will negotiate two Aggregator IDs:

Aggregator ID 1

 - eth0 and eth1



Aggregator ID 2

 - eth2 and eth3

And the switches will have a view completely separate of each other.

Switch 1 will think:

Switch 1

 PortChannel 1

 - port X

 - port Y

Switch 2 will think:

Switch 2

 PortChannel 1

 - port X

 - port Y

>From the Linux system with the bond, only one Aggregator will be used at a 
>given time, and will fail over depending on ad_select.

So assuming Aggregator ID 1 is in use, and you pull eth0's cable out, the 
default behaviour is to stay on Aggregator ID 1.

However, Aggregator ID 1 only has 1 cable, and there's a spare Aggregator ID 2 
with 2 cables - twice the bandwidth!

If you use ad_select=count or ad_select=bandwidth, the active Aggregator ID 
fails over to an Aggregator with the most cables or the most bandwidth.

Note that LACP mandates an Aggregator's ports must all be the same speed and 
duplex, so I believe you could configure one Aggregator with 1Gbps ports and 
one Aggregator with 10Gbps ports, and have intelligent selection depending on 
whether you have 20/10/2/1Gbps available.

Sent from Mail<https://go.microsoft.com/fwlink/?LinkId=550986> for Windows 10

From: mhnx<mailto:morphinwith...@gmail.com>
Sent: 28 June 2021 18:46
To: Marc 'risson' Schmitt<mailto:ris...@cri.epita.fr>
Cc: Ceph Users<mailto:ceph-users@ceph.io>
Subject: [ceph-users] Re: Nic bonding (lacp) settings for ceph

Thanks for the answer.
I'm into ad_select bandwitdh because we use osd nodes as rgw gateways, VMs
and different applications.

I have seperate cluster (10+10Gbe) and public (10+10Gbe) network.
I tested stable, bandwitdh and count. Results are clearly good with
bandwitdh. Count is the worst option.
But I wonder if bandwitdh calculation has any effect on the network delay?
If it is then I will return to stable. I don't know now but when i think
about it if every time bonding driver needs to calculate bandwitdh and
decide it should add some cpu power and delay. If it has no effect then
bandwitdh will improve distribution better.

Now I know that I have to use 3+4 but still couldn't decide on ad_select.
Bandwitdh or stable?
Can we discuss it please?

28 Haz 2021 Pzt 20:15 tarihinde Marc 'risson' Schmitt 
şunu yazdı:

> Hi,
>
> On Sat, 26 Jun 2021 16:47:19 +0300
> mhnx  wrote:
> > I've changed ad_select to bandwitdh and both nic is in use now but
> > layer2 hash prevents dual nic usage for between two nodes (because
> > layer2 using only Mac ).
>
> As I understand it, setting ad_select to bandwidth is only going to be
> useful if you have several link aggregates in the same bond, like when
> you are connected in LACP to multiple (non-stacked) switches.
>
> > People advice using layer2+3 for best performance but it has no
> > effect on osds because mac and ip is the same.
> > I've tried layer3+4 to split by ports instead mac and it works. But i
> > dont know what will the effect and also my switch is layer2.
>
> We are setting layer3+4 on both our servers and our switches.
>
> Regards,
>
> --
> Marc 'risson' Schmitt
> CRI - EPITA
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Nic bonding (lacp) settings for ceph

2021-06-28 Thread mhnx
Thanks for the answer.
I'm into ad_select bandwitdh because we use osd nodes as rgw gateways, VMs
and different applications.

I have seperate cluster (10+10Gbe) and public (10+10Gbe) network.
I tested stable, bandwitdh and count. Results are clearly good with
bandwitdh. Count is the worst option.
But I wonder if bandwitdh calculation has any effect on the network delay?
If it is then I will return to stable. I don't know now but when i think
about it if every time bonding driver needs to calculate bandwitdh and
decide it should add some cpu power and delay. If it has no effect then
bandwitdh will improve distribution better.

Now I know that I have to use 3+4 but still couldn't decide on ad_select.
Bandwitdh or stable?
Can we discuss it please?

28 Haz 2021 Pzt 20:15 tarihinde Marc 'risson' Schmitt 
şunu yazdı:

> Hi,
>
> On Sat, 26 Jun 2021 16:47:19 +0300
> mhnx  wrote:
> > I've changed ad_select to bandwitdh and both nic is in use now but
> > layer2 hash prevents dual nic usage for between two nodes (because
> > layer2 using only Mac ).
>
> As I understand it, setting ad_select to bandwidth is only going to be
> useful if you have several link aggregates in the same bond, like when
> you are connected in LACP to multiple (non-stacked) switches.
>
> > People advice using layer2+3 for best performance but it has no
> > effect on osds because mac and ip is the same.
> > I've tried layer3+4 to split by ports instead mac and it works. But i
> > dont know what will the effect and also my switch is layer2.
>
> We are setting layer3+4 on both our servers and our switches.
>
> Regards,
>
> --
> Marc 'risson' Schmitt
> CRI - EPITA
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Nic bonding (lacp) settings for ceph

2021-06-28 Thread Marc 'risson' Schmitt
Hi,

On Sat, 26 Jun 2021 16:47:19 +0300
mhnx  wrote:
> I've changed ad_select to bandwitdh and both nic is in use now but
> layer2 hash prevents dual nic usage for between two nodes (because
> layer2 using only Mac ).

As I understand it, setting ad_select to bandwidth is only going to be
useful if you have several link aggregates in the same bond, like when
you are connected in LACP to multiple (non-stacked) switches.

> People advice using layer2+3 for best performance but it has no
> effect on osds because mac and ip is the same.
> I've tried layer3+4 to split by ports instead mac and it works. But i
> dont know what will the effect and also my switch is layer2.

We are setting layer3+4 on both our servers and our switches.

Regards,

-- 
Marc 'risson' Schmitt
CRI - EPITA
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io