Hi Robert,
I didnt, but I can retry it now to reproduce, this server isn't
particularly critical at the moment...
In the working state:
root@anto:~ # dladm show-aggr -L
LINK PORT AGGREGATABLE SYNC COLL DIST DEFAULTED EXPIRED
aggr0 igb0 yes yes yes yes no no
-- igb1 yes yes yes yes no no
-- igb2 yes yes yes yes no no
-- igb3 yes yes yes yes no no
root@anto:~ # dladm show-aggr -x
LINK PORT SPEED DUPLEX STATE ADDRESS PORTSTATE
aggr0 -- 1000Mb full up 0:14:4f:e7:39:6 --
igb0 1000Mb full up 0:14:4f:e7:39:6 attached
igb1 1000Mb full up 0:14:4f:e7:39:7 attached
igb2 1000Mb full up 0:14:4f:e7:39:8 attached
igb3 1000Mb full up 0:14:4f:e7:39:9 attached
After rebooting into 20160804T173241Z:
root@anto:~ # dladm show-aggr -L
LINK PORT AGGREGATABLE SYNC COLL DIST DEFAULTED EXPIRED
aggr0 igb0 yes yes yes yes no no
-- igb1 yes no no no yes no
-- igb2 yes yes yes yes no no
-- igb3 yes no no no yes no
root@anto:~ # dladm show-aggr -x
LINK PORT SPEED DUPLEX STATE ADDRESS PORTSTATE
aggr0 -- 1000Mb full up 0:14:4f:e7:39:6 --
igb0 1000Mb full up 0:14:4f:e7:39:6 attached
igb1 1000Mb full up 0:14:4f:e7:39:7 attached
igb2 1000Mb full up 0:14:4f:e7:39:8 attached
igb3 1000Mb full up 0:14:4f:e7:39:9 attached
Note that that is different from last time - two of the links are up
now, and seen as such from the other side as well. Running a snoop on
igb1 (that didn't come up) shows LACP packets going out from the
SmartOS side but nothing from the Juniper side:
ETHER: ----- Ether Header -----
ETHER:
ETHER: Packet 12 arrived at 8:33:34.14399
ETHER: Packet size = 124 bytes
ETHER: Destination = 1:80:c2:0:0:2, Standard MAC Group Address
ETHER: Source = 0:14:4f:e7:39:7,
ETHER: Ethertype = 8809 (Unknown)
ETHER:
"show interface" on the firewall side shows 0 pps input and 1 pps out,
indicating that it sends LACP PDUs of it's own but doesn't see any
packets from the SmartOS side, i.e. the opposite of what snoop sees.
Flapping an interface from the firewall side made it come up. After boot:
root@anto:~ # dladm show-aggr -x
LINK PORT SPEED DUPLEX STATE ADDRESS PORTSTATE
aggr0 -- 1000Mb full up 0:14:4f:e7:39:6 --
igb0 1000Mb full up 0:14:4f:e7:39:6 attached
igb1 1000Mb full up 0:14:4f:e7:39:7 attached
igb2 1000Mb full up 0:14:4f:e7:39:8 attached
igb3 1000Mb full up 0:14:4f:e7:39:9 attached
root@anto:~ # dladm show-aggr -L
LINK PORT AGGREGATABLE SYNC COLL DIST DEFAULTED EXPIRED
aggr0 igb0 yes yes yes yes no no
-- igb1 yes no no no yes no
-- igb2 yes yes yes yes no no
-- igb3 yes no no no yes no
Disable the port where igb3 is connected on the firewall side...
root@anto:~ # dladm show-aggr -x
LINK PORT SPEED DUPLEX STATE ADDRESS PORTSTATE
aggr0 -- 1000Mb full up 0:14:4f:e7:39:6 --
igb0 1000Mb full up 0:14:4f:e7:39:6 attached
igb1 1000Mb full up 0:14:4f:e7:39:7 attached
igb2 1000Mb full up 0:14:4f:e7:39:8 attached
igb3 0Mb half down 0:14:4f:e7:39:9 standby
Enable it again:
root@anto:~ # dladm show-aggr -x
LINK PORT SPEED DUPLEX STATE ADDRESS PORTSTATE
aggr0 -- 1000Mb full up 0:14:4f:e7:39:6 --
igb0 1000Mb full up 0:14:4f:e7:39:6 attached
igb1 1000Mb full up 0:14:4f:e7:39:7 attached
igb2 1000Mb full up 0:14:4f:e7:39:8 attached
igb3 1000Mb full up 0:14:4f:e7:39:9 attached
root@anto:~ # dladm show-aggr -L
LINK PORT AGGREGATABLE SYNC COLL DIST DEFAULTED EXPIRED
aggr0 igb0 yes yes yes yes no no
-- igb1 yes no no no yes no
-- igb2 yes yes yes yes no no
-- igb3 yes yes yes yes no no
Traffic flows.
Any ideas for further troubleshooting here? It would seem to be
something related to the igb link layer, to me, and not LACP specific
at least.
Rebooting back into 20160804T173241Z, it's solid.
//jb
2016-08-23 23:17 GMT+02:00 Robert Mustacchi <[email protected]>:
> On 8/23/16 0:08 , Jakob Borg wrote:
>> Hi,
>>
>> I upgraded a machine from SmartOS release 20160804T173241Z to
>> 20160818T234814Z today. After rebooting, there was no network
>> connectivity. After some debugging it turned out that LACP link
>> aggregation didn't come up.
>>
>> The server is an old Sun X4170, using the built in four port igb card,
>> connected to a Juniper SRX firewall. In the failed state, all links
>> were marked as physically up but the LACP didn't negotiate:
>>
>> root@anto:~ # uname -a
>> SunOS anto.nym.se 5.11 joyent_20160818T234814Z i86pc i386 i86pc
>>
>> root@anto:~ # dladm show-link
>> LINK CLASS MTU STATE BRIDGE OVER
>> igb0 phys 1500 up -- --
>> igb2 phys 1500 up -- --
>> igb1 phys 1500 up -- --
>> igb3 phys 1500 up -- --
>> aggr0 aggr 1500 up -- igb0 igb1 igb2 igb3
>> net0 vnic 1500 ? -- aggr0
>> net0 vnic 1500 ? -- aggr0
>> eth0 vnic 1500 ? -- aggr0
>> eth0 vnic 1500 ? -- aggr0
>> eth0 vnic 1500 ? -- aggr0
>> net0 vnic 1500 ? -- aggr0
>> net0 vnic 1500 ? -- aggr0
>> net0 vnic 1500 ? -- aggr0
>>
>> root@anto:~ # dladm show-aggr -L
>> LINK PORT AGGREGATABLE SYNC COLL DIST DEFAULTED EXPIRED
>> aggr0 igb0 yes no no no yes no
>> -- igb1 yes no no no yes no
>> -- igb2 yes no no no yes no
>> -- igb3 yes no no no yes no
>>
>> On the firewall side, interfaces were up but indicating no LACP traffic:
>>
>> jb@hlv-srx240> show lacp interfaces ae0
>> Aggregated interface: ae0
>> LACP state: Role Exp Def Dist Col Syn Aggr Timeout
>> Activity
>> ge-0/0/12 Actor No Yes No No No Yes Fast
>> Active
>> ge-0/0/12 Partner No Yes No No No Yes Fast
>> Passive
>> ge-0/0/13 Actor No Yes No No No Yes Fast
>> Active
>> ge-0/0/13 Partner No Yes No No No Yes Fast
>> Passive
>> ge-0/0/14 Actor No Yes No No No Yes Fast
>> Active
>> ge-0/0/14 Partner No Yes No No No Yes Fast
>> Passive
>> ge-0/0/15 Actor No Yes No No No Yes Fast
>> Active
>> ge-0/0/15 Partner No Yes No No No Yes Fast
>> Passive
>> LACP protocol: Receive State Transmit State Mux State
>> ge-0/0/12 Defaulted Fast periodic Detached
>> ge-0/0/13 Defaulted Fast periodic Detached
>> ge-0/0/14 Defaulted Fast periodic Detached
>> ge-0/0/15 Defaulted Fast periodic Detached
>>
>> The configuration is fairly straight forward:
>>
>> root@anto:~ # egrep aggr\|admin /usbkey/config
>> aggr0_aggr=0:14:4f:e7:39:6,0:14:4f:e7:39:7,0:14:4f:e7:39:8,0:14:4f:e7:39:9
>> aggr0_lacp_mode=active
>> # admin_nic is the nic admin_ip will be connected to for headnode zones.
>> admin_nic=aggr0
>> admin_ip=172.16.32.3
>> admin_ip6=2001:470:deeb:32::3/64
>> admin_netmask=255.255.255.0
>> admin_network=...
>> admin_gateway=172.16.32.11
>> internal_nic=aggr0
>>
>> After rebooting back into 20160804T173241Z, everything is fine again -
>> LACP comes up immediately.
>>
>> Any ideas? The changelog didn't mention anything specifically relevant
>> (searching for "lacp", "igb") that I could see...
>
> Did you also happen to grab dladm show-aggr -x when you were on the
> system where it wasn't working?
>
> What does dladm show-aggr -L and dladm show-aggr -x show now that you're
> on the system where it's working?
>
> Thanks,
> Robert
>
-------------------------------------------
smartos-discuss
Archives: https://www.listbox.com/member/archive/184463/=now
RSS Feed: https://www.listbox.com/member/archive/rss/184463/25769125-55cfbc00
Modify Your Subscription:
https://www.listbox.com/member/?member_id=25769125&id_secret=25769125-7688e9fb
Powered by Listbox: http://www.listbox.com