Salutari,
Dupa 2 zile de chin si sapaturi arheologice, ma vad nevoit sa dau si aici cu intrebarea. Deci: Se da un numar X de masini, model HP BL460 G7. Adicatelea blade-uri. Ele contin fiecare cate doua placi de retea a cate 2 porturi bucata, mai exact: [root@host01 ~]# lspci | grep -i ether 02:00.0 Ethernet controller: Emulex Corporation OneConnect 10Gb NIC (be3) (rev 01) 02:00.1 Ethernet controller: Emulex Corporation OneConnect 10Gb NIC (be3) (rev 01) 09:00.0 Ethernet controller: Broadcom Corporation NetXtreme II BCM5709S Gigabit Ethernet (rev 20) 09:00.1 Ethernet controller: Broadcom Corporation NetXtreme II BCM5709S Gigabit Ethernet (rev 20) Evident c-am vrut sa cuplez porturile in perechi de pe placi diferite ca sa evit situatii neplacute gen "s-a dus pe copca un controller ethernet". Concluzie: bond0 = eth0 + eth2 si bond1 = eth1 + eth3. Definitii: [root@host01 network-scripts]# cat ifcfg-eth* # Broadcom Corporation NetXtreme II BCM5709S Gigabit Ethernet DEVICE=eth0 BOOTPROTO=none SLAVE=yes MASTER=bond0 ONBOOT=yes # Broadcom Corporation NetXtreme II BCM5709S Gigabit Ethernet DEVICE=eth1 ONBOOT=yes HOTPLUG=no SLAVE=yes MASTER=bond1 # Emulex Corporation OneConnect 10Gb NIC (be3) DEVICE=eth2 ONBOOT=yes SLAVE=yes MASTER=bond0 # Emulex Corporation OneConnect 10Gb NIC (be3) DEVICE=eth3 ONBOOT=yes HOTPLUG=no SLAVE=yes MASTER=bond1 [root@host01 network-scripts]# cat ifcfg-bond* # bond0 device comprised of eth2 (PCI, default active) and eth0 (onboard, default standby) DEVICE=bond0 BONDING_OPTS="mode=1 arp_interval=100 arp_ip_target=x.y.z.5 fail_over_mac=1 arp_validate=1" BOOTPROTO=static BROADCAST=x.y.z.255 IPADDR=x.y.z.13 NETMASK=255.255.255.0 NETWORK=x.y.z.0 GATEWAY=x.y.z.5 ONBOOT=yes # bond1 device comprised of eth3 (PCI, default active) and eth1 (onboard, default standby) DEVICE=bond1 BONDING_OPTS="mode=1 arp_interval=100 arp_ip_target=a.b.c.21,a.b.c.23 arp_validate=3 fail_over_mac=0" BOOTPROTO=static IPADDR=a.b.c.21 NETMASK=255.255.255.240 NETWORK=a.b.c.16 BROADCAST=a.b.c.31 ONBOOT=yes Incercarile de failover posibile sunt doar pe baza de "ifdown ethX", intrucat serverul se conecteaza intr-un backplane care da in niste switchuri fizic existente in sasiu. Din acest motiv testul cu miimon (prezenta purtatoarei la layer 1) nu sunt relevante dar au fost totusi incercate mai devreme. (Maine dimineata o sa facem si testul cu scos cablul, dar nu merge decat pentru doua din cele 4 porturi si trebuie modificate setarile sasiului, chestie pe care am zis sa n-o fac totusi ca mai sunt si alte sisteme p-acolo.) Au fost mai multe combinatii de optiuni pentru modului de bonding, astea sunt printre ultimele. Clientul prefera modul active-backup; teoretic ar trebui sa fie functional si round-robin-ul (modul 0) dar nu sunt absolut convins ca nu ma pasc probleme ulterioare (urmeaza sa ajunga pe masina un Oracle RAC). Combinatii de parametri: fail_over_mac=0, 1 sau 2; arp_validate=0, 1, 2 sau 3 (nu stiu ce "smecherii" au facut la nivel de switchuri); s-a incercat si "miimon=100" in loc de arp_interval=100"; acelasi rezultat. Rezultatul: Test pornit de pe o alta masina (cele doua segmente de retea sunt respectiv comune): [root@host02 ~]# ping a.b.c.21 PING a.b.c.21 (a.b.c.21) 56(84) bytes of data. 64 bytes from a.b.c.21: icmp_seq=1 ttl=64 time=0.168 ms 64 bytes from a.b.c.21: icmp_seq=2 ttl=64 time=0.211 ms 64 bytes from a.b.c.21: icmp_seq=3 ttl=64 time=0.182 ms 64 bytes from a.b.c.21: icmp_seq=4 ttl=64 time=0.171 ms 64 bytes from a.b.c.21: icmp_seq=5 ttl=64 time=0.203 ms 64 bytes from a.b.c.21: icmp_seq=6 ttl=64 time=0.180 ms 64 bytes from a.b.c.21: icmp_seq=7 ttl=64 time=0.181 ms [aici se da ifdown pe host01] Pe host01 (masina de test): [root@chost01 network-scripts]# cat /proc/net/bonding/bond1 Ethernet Channel Bonding Driver: v3.5.0 (November 4, 2008) Bonding Mode: fault-tolerance (active-backup) Primary Slave: None Currently Active Slave: eth3 MII Status: up MII Polling Interval (ms): 0 Up Delay (ms): 0 Down Delay (ms): 0 ARP Polling Interval (ms): 100 ARP IP target/s (n.n.n.n form): a.b.c.21, a.b.c.23 Slave Interface: eth1 MII Status: down Link Failure Count: 3 Permanent HW addr: 98:4b:e1:5e:1e:80 Slave Interface: eth3 MII Status: up Link Failure Count: 0 Permanent HW addr: 00:17:a4:77:04:1e [[[aici totul e frumos]]] [root@host01 network-scripts]# ifdown eth3 [root@host01 network-scripts]# cat /proc/net/bonding/bond1 Ethernet Channel Bonding Driver: v3.5.0 (November 4, 2008) Bonding Mode: fault-tolerance (active-backup) Primary Slave: None Currently Active Slave: None MII Status: up MII Polling Interval (ms): 0 Up Delay (ms): 0 Down Delay (ms): 0 ARP Polling Interval (ms): 100 ARP IP target/s (n.n.n.n form): a.b.c.21, a.b.c.23 Slave Interface: eth1 MII Status: down Link Failure Count: 3 Permanent HW addr: 98:4b:e1:5e:1e:80 ...si a murit. Nici o incercare de 'ifup eth1' (care ar fi trebuit sa preia traficul) nu are succes. Distributie RHEL 5 update la zi, no extra fiddlings. Any hints? Multam frumos! -- Ave http://flying.prwave.ro _______________________________________________ RLUG mailing list [email protected] http://lists.lug.ro/mailman/listinfo/rlug
