Salutari,

Dupa 2 zile de chin si sapaturi arheologice, ma vad nevoit sa dau si
aici cu intrebarea. Deci:

Se da un numar X de masini, model HP BL460 G7. Adicatelea blade-uri.
Ele contin fiecare cate doua placi de retea a cate 2 porturi bucata,
mai exact:

[root@host01 ~]# lspci | grep -i ether
02:00.0 Ethernet controller: Emulex Corporation OneConnect 10Gb NIC
(be3) (rev 01)
02:00.1 Ethernet controller: Emulex Corporation OneConnect 10Gb NIC
(be3) (rev 01)
09:00.0 Ethernet controller: Broadcom Corporation NetXtreme II
BCM5709S Gigabit Ethernet (rev 20)
09:00.1 Ethernet controller: Broadcom Corporation NetXtreme II
BCM5709S Gigabit Ethernet (rev 20)

Evident c-am vrut sa cuplez porturile in perechi de pe placi diferite
ca sa evit situatii neplacute gen "s-a dus pe copca un controller
ethernet". Concluzie: bond0 = eth0 + eth2 si bond1 = eth1 + eth3.
Definitii:

[root@host01 network-scripts]# cat ifcfg-eth*
# Broadcom Corporation NetXtreme II BCM5709S Gigabit Ethernet
DEVICE=eth0
BOOTPROTO=none
SLAVE=yes
MASTER=bond0
ONBOOT=yes
# Broadcom Corporation NetXtreme II BCM5709S Gigabit Ethernet
DEVICE=eth1
ONBOOT=yes
HOTPLUG=no
SLAVE=yes
MASTER=bond1
# Emulex Corporation OneConnect 10Gb NIC (be3)
DEVICE=eth2
ONBOOT=yes
SLAVE=yes
MASTER=bond0
# Emulex Corporation OneConnect 10Gb NIC (be3)
DEVICE=eth3
ONBOOT=yes
HOTPLUG=no
SLAVE=yes
MASTER=bond1

[root@host01 network-scripts]# cat ifcfg-bond*
# bond0 device comprised of eth2 (PCI, default active) and eth0
(onboard, default standby)
DEVICE=bond0
BONDING_OPTS="mode=1 arp_interval=100 arp_ip_target=x.y.z.5
fail_over_mac=1 arp_validate=1"
BOOTPROTO=static
BROADCAST=x.y.z.255
IPADDR=x.y.z.13
NETMASK=255.255.255.0
NETWORK=x.y.z.0
GATEWAY=x.y.z.5
ONBOOT=yes
# bond1 device comprised of eth3 (PCI, default active) and eth1
(onboard, default standby)
DEVICE=bond1
BONDING_OPTS="mode=1 arp_interval=100 arp_ip_target=a.b.c.21,a.b.c.23
arp_validate=3 fail_over_mac=0"
BOOTPROTO=static
IPADDR=a.b.c.21
NETMASK=255.255.255.240
NETWORK=a.b.c.16
BROADCAST=a.b.c.31
ONBOOT=yes


Incercarile de failover posibile sunt doar pe baza de "ifdown ethX",
intrucat serverul se conecteaza intr-un backplane care da in niste
switchuri fizic existente in sasiu. Din acest motiv testul cu miimon
(prezenta purtatoarei la layer 1) nu sunt relevante dar au fost totusi
incercate mai devreme. (Maine dimineata o sa facem si testul cu scos
cablul, dar nu merge decat pentru doua din cele 4 porturi si trebuie
modificate setarile sasiului, chestie pe care am zis sa n-o fac totusi
ca mai sunt si alte sisteme p-acolo.) Au fost mai multe combinatii de
optiuni pentru modului de bonding, astea sunt printre ultimele.
Clientul prefera modul active-backup; teoretic ar trebui sa fie
functional si round-robin-ul (modul 0) dar nu sunt absolut convins ca
nu ma pasc probleme ulterioare (urmeaza sa ajunga pe masina un Oracle
RAC).

Combinatii de parametri:
fail_over_mac=0, 1 sau 2;
arp_validate=0, 1, 2 sau 3 (nu stiu ce "smecherii" au facut la nivel
de switchuri);
s-a incercat si "miimon=100" in loc de arp_interval=100"; acelasi rezultat.

Rezultatul:

Test pornit de pe o alta masina (cele doua segmente de retea sunt
respectiv comune):

[root@host02 ~]# ping a.b.c.21
PING a.b.c.21 (a.b.c.21) 56(84) bytes of data.
64 bytes from a.b.c.21: icmp_seq=1 ttl=64 time=0.168 ms
64 bytes from a.b.c.21: icmp_seq=2 ttl=64 time=0.211 ms
64 bytes from a.b.c.21: icmp_seq=3 ttl=64 time=0.182 ms
64 bytes from a.b.c.21: icmp_seq=4 ttl=64 time=0.171 ms
64 bytes from a.b.c.21: icmp_seq=5 ttl=64 time=0.203 ms
64 bytes from a.b.c.21: icmp_seq=6 ttl=64 time=0.180 ms
64 bytes from a.b.c.21: icmp_seq=7 ttl=64 time=0.181 ms
[aici se da ifdown pe host01]


Pe host01 (masina de test):
[root@chost01 network-scripts]# cat /proc/net/bonding/bond1
Ethernet Channel Bonding Driver: v3.5.0 (November 4, 2008)

Bonding Mode: fault-tolerance (active-backup)
Primary Slave: None
Currently Active Slave: eth3
MII Status: up
MII Polling Interval (ms): 0
Up Delay (ms): 0
Down Delay (ms): 0
ARP Polling Interval (ms): 100
ARP IP target/s (n.n.n.n form): a.b.c.21, a.b.c.23

Slave Interface: eth1
MII Status: down
Link Failure Count: 3
Permanent HW addr: 98:4b:e1:5e:1e:80

Slave Interface: eth3
MII Status: up
Link Failure Count: 0
Permanent HW addr: 00:17:a4:77:04:1e

[[[aici totul e frumos]]]

[root@host01 network-scripts]# ifdown eth3

[root@host01 network-scripts]# cat /proc/net/bonding/bond1
Ethernet Channel Bonding Driver: v3.5.0 (November 4, 2008)

Bonding Mode: fault-tolerance (active-backup)
Primary Slave: None
Currently Active Slave: None
MII Status: up
MII Polling Interval (ms): 0
Up Delay (ms): 0
Down Delay (ms): 0
ARP Polling Interval (ms): 100
ARP IP target/s (n.n.n.n form): a.b.c.21, a.b.c.23

Slave Interface: eth1
MII Status: down
Link Failure Count: 3
Permanent HW addr: 98:4b:e1:5e:1e:80

...si a murit. Nici o incercare de 'ifup eth1' (care ar fi trebuit sa
preia traficul) nu are succes.

Distributie RHEL 5 update la zi, no extra fiddlings.

Any hints?



Multam frumos!

--
Ave
http://flying.prwave.ro
_______________________________________________
RLUG mailing list
[email protected]
http://lists.lug.ro/mailman/listinfo/rlug

Raspunde prin e-mail lui