I've been looking through the diff on illumos-joyent and I fail to see
anything obviously relevant. :/ If noone else is seeing this I'll try
to get a build environment up so I can bisect this and figure it
out...

//jb

2016-08-24 10:54 GMT+02:00 Jakob Borg <[email protected]>:
> Hi Robert,
>
> I didnt, but I can retry it now to reproduce, this server isn't
> particularly critical at the moment...
>
> In the working state:
>
> root@anto:~ # dladm show-aggr -L
> LINK        PORT         AGGREGATABLE SYNC COLL DIST DEFAULTED EXPIRED
> aggr0       igb0         yes          yes  yes  yes  no        no
> --          igb1         yes          yes  yes  yes  no        no
> --          igb2         yes          yes  yes  yes  no        no
> --          igb3         yes          yes  yes  yes  no        no
>
> root@anto:~ # dladm show-aggr -x
> LINK        PORT           SPEED DUPLEX   STATE     ADDRESS            
> PORTSTATE
> aggr0       --             1000Mb full    up        0:14:4f:e7:39:6    --
>             igb0           1000Mb full    up        0:14:4f:e7:39:6    
> attached
>             igb1           1000Mb full    up        0:14:4f:e7:39:7    
> attached
>             igb2           1000Mb full    up        0:14:4f:e7:39:8    
> attached
>             igb3           1000Mb full    up        0:14:4f:e7:39:9    
> attached
>
> After rebooting into 20160804T173241Z:
>
> root@anto:~ # dladm show-aggr -L
> LINK        PORT         AGGREGATABLE SYNC COLL DIST DEFAULTED EXPIRED
> aggr0       igb0         yes          yes  yes  yes  no        no
> --          igb1         yes          no   no   no   yes       no
> --          igb2         yes          yes  yes  yes  no        no
> --          igb3         yes          no   no   no   yes       no
>
> root@anto:~ # dladm show-aggr -x
> LINK        PORT           SPEED DUPLEX   STATE     ADDRESS            
> PORTSTATE
> aggr0       --             1000Mb full    up        0:14:4f:e7:39:6    --
>             igb0           1000Mb full    up        0:14:4f:e7:39:6    
> attached
>             igb1           1000Mb full    up        0:14:4f:e7:39:7    
> attached
>             igb2           1000Mb full    up        0:14:4f:e7:39:8    
> attached
>             igb3           1000Mb full    up        0:14:4f:e7:39:9    
> attached
>
> Note that that is different from last time - two of the links are up
> now, and seen as such from the other side as well. Running a snoop on
> igb1 (that didn't come up) shows LACP packets going out from the
> SmartOS side but nothing from the Juniper side:
>
> ETHER:  ----- Ether Header -----
> ETHER:
> ETHER:  Packet 12 arrived at 8:33:34.14399
> ETHER:  Packet size = 124 bytes
> ETHER:  Destination = 1:80:c2:0:0:2, Standard MAC Group Address
> ETHER:  Source      = 0:14:4f:e7:39:7,
> ETHER:  Ethertype = 8809 (Unknown)
> ETHER:
>
> "show interface" on the firewall side shows 0 pps input and 1 pps out,
> indicating that it sends LACP PDUs of it's own but doesn't see any
> packets from the SmartOS side, i.e. the opposite of what snoop sees.
>
> Flapping an interface from the firewall side made it come up. After boot:
>
> root@anto:~ # dladm show-aggr -x
> LINK        PORT           SPEED DUPLEX   STATE     ADDRESS            
> PORTSTATE
> aggr0       --             1000Mb full    up        0:14:4f:e7:39:6    --
>             igb0           1000Mb full    up        0:14:4f:e7:39:6    
> attached
>             igb1           1000Mb full    up        0:14:4f:e7:39:7    
> attached
>             igb2           1000Mb full    up        0:14:4f:e7:39:8    
> attached
>             igb3           1000Mb full    up        0:14:4f:e7:39:9    
> attached
>
> root@anto:~ # dladm show-aggr -L
> LINK        PORT         AGGREGATABLE SYNC COLL DIST DEFAULTED EXPIRED
> aggr0       igb0         yes          yes  yes  yes  no        no
> --          igb1         yes          no   no   no   yes       no
> --          igb2         yes          yes  yes  yes  no        no
> --          igb3         yes          no   no   no   yes       no
>
> Disable the port where igb3 is connected on the firewall side...
>
> root@anto:~ # dladm show-aggr -x
> LINK        PORT           SPEED DUPLEX   STATE     ADDRESS            
> PORTSTATE
> aggr0       --             1000Mb full    up        0:14:4f:e7:39:6    --
>             igb0           1000Mb full    up        0:14:4f:e7:39:6    
> attached
>             igb1           1000Mb full    up        0:14:4f:e7:39:7    
> attached
>             igb2           1000Mb full    up        0:14:4f:e7:39:8    
> attached
>             igb3           0Mb  half      down      0:14:4f:e7:39:9    standby
>
> Enable it again:
>
> root@anto:~ # dladm show-aggr -x
> LINK        PORT           SPEED DUPLEX   STATE     ADDRESS            
> PORTSTATE
> aggr0       --             1000Mb full    up        0:14:4f:e7:39:6    --
>             igb0           1000Mb full    up        0:14:4f:e7:39:6    
> attached
>             igb1           1000Mb full    up        0:14:4f:e7:39:7    
> attached
>             igb2           1000Mb full    up        0:14:4f:e7:39:8    
> attached
>             igb3           1000Mb full    up        0:14:4f:e7:39:9    
> attached
>
> root@anto:~ # dladm show-aggr -L
> LINK        PORT         AGGREGATABLE SYNC COLL DIST DEFAULTED EXPIRED
> aggr0       igb0         yes          yes  yes  yes  no        no
> --          igb1         yes          no   no   no   yes       no
> --          igb2         yes          yes  yes  yes  no        no
> --          igb3         yes          yes  yes  yes  no        no
>
> Traffic flows.
>
> Any ideas for further troubleshooting here? It would seem to be
> something related to the igb link layer, to me, and not LACP specific
> at least.
>
> Rebooting back into 20160804T173241Z, it's solid.
>
> //jb
>
> 2016-08-23 23:17 GMT+02:00 Robert Mustacchi <[email protected]>:
>> On 8/23/16 0:08 , Jakob Borg wrote:
>>> Hi,
>>>
>>> I upgraded a machine from SmartOS release 20160804T173241Z to
>>> 20160818T234814Z today. After rebooting, there was no network
>>> connectivity. After some debugging it turned out that LACP link
>>> aggregation didn't come up.
>>>
>>> The server is an old Sun X4170, using the built in four port igb card,
>>> connected to a Juniper SRX firewall. In the failed state, all links
>>> were marked as physically up but the LACP didn't negotiate:
>>>
>>> root@anto:~ # uname -a
>>> SunOS anto.nym.se 5.11 joyent_20160818T234814Z i86pc i386 i86pc
>>>
>>> root@anto:~ # dladm show-link
>>> LINK        CLASS     MTU    STATE    BRIDGE     OVER
>>> igb0        phys      1500   up       --         --
>>> igb2        phys      1500   up       --         --
>>> igb1        phys      1500   up       --         --
>>> igb3        phys      1500   up       --         --
>>> aggr0       aggr      1500   up       --         igb0 igb1 igb2 igb3
>>> net0        vnic      1500   ?        --         aggr0
>>> net0        vnic      1500   ?        --         aggr0
>>> eth0        vnic      1500   ?        --         aggr0
>>> eth0        vnic      1500   ?        --         aggr0
>>> eth0        vnic      1500   ?        --         aggr0
>>> net0        vnic      1500   ?        --         aggr0
>>> net0        vnic      1500   ?        --         aggr0
>>> net0        vnic      1500   ?        --         aggr0
>>>
>>> root@anto:~ # dladm show-aggr -L
>>> LINK        PORT         AGGREGATABLE SYNC COLL DIST DEFAULTED EXPIRED
>>> aggr0       igb0         yes          no   no   no   yes       no
>>> --          igb1         yes          no   no   no   yes       no
>>> --          igb2         yes          no   no   no   yes       no
>>> --          igb3         yes          no   no   no   yes       no
>>>
>>> On the firewall side, interfaces were up but indicating no LACP traffic:
>>>
>>> jb@hlv-srx240> show lacp interfaces ae0
>>> Aggregated interface: ae0
>>>     LACP state:       Role   Exp   Def  Dist  Col  Syn  Aggr  Timeout  
>>> Activity
>>>       ge-0/0/12      Actor    No   Yes    No   No   No   Yes     Fast    
>>> Active
>>>       ge-0/0/12    Partner    No   Yes    No   No   No   Yes     Fast   
>>> Passive
>>>       ge-0/0/13      Actor    No   Yes    No   No   No   Yes     Fast    
>>> Active
>>>       ge-0/0/13    Partner    No   Yes    No   No   No   Yes     Fast   
>>> Passive
>>>       ge-0/0/14      Actor    No   Yes    No   No   No   Yes     Fast    
>>> Active
>>>       ge-0/0/14    Partner    No   Yes    No   No   No   Yes     Fast   
>>> Passive
>>>       ge-0/0/15      Actor    No   Yes    No   No   No   Yes     Fast    
>>> Active
>>>       ge-0/0/15    Partner    No   Yes    No   No   No   Yes     Fast   
>>> Passive
>>>     LACP protocol:        Receive State  Transmit State          Mux State
>>>       ge-0/0/12               Defaulted   Fast periodic           Detached
>>>       ge-0/0/13               Defaulted   Fast periodic           Detached
>>>       ge-0/0/14               Defaulted   Fast periodic           Detached
>>>       ge-0/0/15               Defaulted   Fast periodic           Detached
>>>
>>> The configuration is fairly straight forward:
>>>
>>> root@anto:~ # egrep aggr\|admin /usbkey/config
>>> aggr0_aggr=0:14:4f:e7:39:6,0:14:4f:e7:39:7,0:14:4f:e7:39:8,0:14:4f:e7:39:9
>>> aggr0_lacp_mode=active
>>> # admin_nic is the nic admin_ip will be connected to for headnode zones.
>>> admin_nic=aggr0
>>> admin_ip=172.16.32.3
>>> admin_ip6=2001:470:deeb:32::3/64
>>> admin_netmask=255.255.255.0
>>> admin_network=...
>>> admin_gateway=172.16.32.11
>>> internal_nic=aggr0
>>>
>>> After rebooting back into 20160804T173241Z, everything is fine again -
>>> LACP comes up immediately.
>>>
>>> Any ideas? The changelog didn't mention anything specifically relevant
>>> (searching for "lacp", "igb") that I could see...
>> 
>> Did you also happen to grab dladm show-aggr -x when you were on the
>> system where it wasn't working?
>> 
>> What does dladm show-aggr -L and dladm show-aggr -x show now that you're
>> on the system where it's working?
>> 
>> Thanks,
>> Robert
>> 


-------------------------------------------
smartos-discuss
Archives: https://www.listbox.com/member/archive/184463/=now
RSS Feed: https://www.listbox.com/member/archive/rss/184463/25769125-55cfbc00
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=25769125&id_secret=25769125-7688e9fb
Powered by Listbox: http://www.listbox.com

Reply via email to