Hi Robert,

I didnt, but I can retry it now to reproduce, this server isn't
particularly critical at the moment...

In the working state:

root@anto:~ # dladm show-aggr -L
LINK        PORT         AGGREGATABLE SYNC COLL DIST DEFAULTED EXPIRED
aggr0       igb0         yes          yes  yes  yes  no        no
--          igb1         yes          yes  yes  yes  no        no
--          igb2         yes          yes  yes  yes  no        no
--          igb3         yes          yes  yes  yes  no        no

root@anto:~ # dladm show-aggr -x
LINK        PORT           SPEED DUPLEX   STATE     ADDRESS            PORTSTATE
aggr0       --             1000Mb full    up        0:14:4f:e7:39:6    --
            igb0           1000Mb full    up        0:14:4f:e7:39:6    attached
            igb1           1000Mb full    up        0:14:4f:e7:39:7    attached
            igb2           1000Mb full    up        0:14:4f:e7:39:8    attached
            igb3           1000Mb full    up        0:14:4f:e7:39:9    attached

After rebooting into 20160804T173241Z:

root@anto:~ # dladm show-aggr -L
LINK        PORT         AGGREGATABLE SYNC COLL DIST DEFAULTED EXPIRED
aggr0       igb0         yes          yes  yes  yes  no        no
--          igb1         yes          no   no   no   yes       no
--          igb2         yes          yes  yes  yes  no        no
--          igb3         yes          no   no   no   yes       no

root@anto:~ # dladm show-aggr -x
LINK        PORT           SPEED DUPLEX   STATE     ADDRESS            PORTSTATE
aggr0       --             1000Mb full    up        0:14:4f:e7:39:6    --
            igb0           1000Mb full    up        0:14:4f:e7:39:6    attached
            igb1           1000Mb full    up        0:14:4f:e7:39:7    attached
            igb2           1000Mb full    up        0:14:4f:e7:39:8    attached
            igb3           1000Mb full    up        0:14:4f:e7:39:9    attached

Note that that is different from last time - two of the links are up
now, and seen as such from the other side as well. Running a snoop on
igb1 (that didn't come up) shows LACP packets going out from the
SmartOS side but nothing from the Juniper side:

ETHER:  ----- Ether Header -----
ETHER:
ETHER:  Packet 12 arrived at 8:33:34.14399
ETHER:  Packet size = 124 bytes
ETHER:  Destination = 1:80:c2:0:0:2, Standard MAC Group Address
ETHER:  Source      = 0:14:4f:e7:39:7,
ETHER:  Ethertype = 8809 (Unknown)
ETHER:

"show interface" on the firewall side shows 0 pps input and 1 pps out,
indicating that it sends LACP PDUs of it's own but doesn't see any
packets from the SmartOS side, i.e. the opposite of what snoop sees.

Flapping an interface from the firewall side made it come up. After boot:

root@anto:~ # dladm show-aggr -x
LINK        PORT           SPEED DUPLEX   STATE     ADDRESS            PORTSTATE
aggr0       --             1000Mb full    up        0:14:4f:e7:39:6    --
            igb0           1000Mb full    up        0:14:4f:e7:39:6    attached
            igb1           1000Mb full    up        0:14:4f:e7:39:7    attached
            igb2           1000Mb full    up        0:14:4f:e7:39:8    attached
            igb3           1000Mb full    up        0:14:4f:e7:39:9    attached

root@anto:~ # dladm show-aggr -L
LINK        PORT         AGGREGATABLE SYNC COLL DIST DEFAULTED EXPIRED
aggr0       igb0         yes          yes  yes  yes  no        no
--          igb1         yes          no   no   no   yes       no
--          igb2         yes          yes  yes  yes  no        no
--          igb3         yes          no   no   no   yes       no

Disable the port where igb3 is connected on the firewall side...

root@anto:~ # dladm show-aggr -x
LINK        PORT           SPEED DUPLEX   STATE     ADDRESS            PORTSTATE
aggr0       --             1000Mb full    up        0:14:4f:e7:39:6    --
            igb0           1000Mb full    up        0:14:4f:e7:39:6    attached
            igb1           1000Mb full    up        0:14:4f:e7:39:7    attached
            igb2           1000Mb full    up        0:14:4f:e7:39:8    attached
            igb3           0Mb  half      down      0:14:4f:e7:39:9    standby

Enable it again:

root@anto:~ # dladm show-aggr -x
LINK        PORT           SPEED DUPLEX   STATE     ADDRESS            PORTSTATE
aggr0       --             1000Mb full    up        0:14:4f:e7:39:6    --
            igb0           1000Mb full    up        0:14:4f:e7:39:6    attached
            igb1           1000Mb full    up        0:14:4f:e7:39:7    attached
            igb2           1000Mb full    up        0:14:4f:e7:39:8    attached
            igb3           1000Mb full    up        0:14:4f:e7:39:9    attached

root@anto:~ # dladm show-aggr -L
LINK        PORT         AGGREGATABLE SYNC COLL DIST DEFAULTED EXPIRED
aggr0       igb0         yes          yes  yes  yes  no        no
--          igb1         yes          no   no   no   yes       no
--          igb2         yes          yes  yes  yes  no        no
--          igb3         yes          yes  yes  yes  no        no

Traffic flows.

Any ideas for further troubleshooting here? It would seem to be
something related to the igb link layer, to me, and not LACP specific
at least.

Rebooting back into 20160804T173241Z, it's solid.

//jb

2016-08-23 23:17 GMT+02:00 Robert Mustacchi <[email protected]>:
> On 8/23/16 0:08 , Jakob Borg wrote:
>> Hi,
>>
>> I upgraded a machine from SmartOS release 20160804T173241Z to
>> 20160818T234814Z today. After rebooting, there was no network
>> connectivity. After some debugging it turned out that LACP link
>> aggregation didn't come up.
>>
>> The server is an old Sun X4170, using the built in four port igb card,
>> connected to a Juniper SRX firewall. In the failed state, all links
>> were marked as physically up but the LACP didn't negotiate:
>>
>> root@anto:~ # uname -a
>> SunOS anto.nym.se 5.11 joyent_20160818T234814Z i86pc i386 i86pc
>>
>> root@anto:~ # dladm show-link
>> LINK        CLASS     MTU    STATE    BRIDGE     OVER
>> igb0        phys      1500   up       --         --
>> igb2        phys      1500   up       --         --
>> igb1        phys      1500   up       --         --
>> igb3        phys      1500   up       --         --
>> aggr0       aggr      1500   up       --         igb0 igb1 igb2 igb3
>> net0        vnic      1500   ?        --         aggr0
>> net0        vnic      1500   ?        --         aggr0
>> eth0        vnic      1500   ?        --         aggr0
>> eth0        vnic      1500   ?        --         aggr0
>> eth0        vnic      1500   ?        --         aggr0
>> net0        vnic      1500   ?        --         aggr0
>> net0        vnic      1500   ?        --         aggr0
>> net0        vnic      1500   ?        --         aggr0
>>
>> root@anto:~ # dladm show-aggr -L
>> LINK        PORT         AGGREGATABLE SYNC COLL DIST DEFAULTED EXPIRED
>> aggr0       igb0         yes          no   no   no   yes       no
>> --          igb1         yes          no   no   no   yes       no
>> --          igb2         yes          no   no   no   yes       no
>> --          igb3         yes          no   no   no   yes       no
>>
>> On the firewall side, interfaces were up but indicating no LACP traffic:
>>
>> jb@hlv-srx240> show lacp interfaces ae0
>> Aggregated interface: ae0
>>     LACP state:       Role   Exp   Def  Dist  Col  Syn  Aggr  Timeout  
>> Activity
>>       ge-0/0/12      Actor    No   Yes    No   No   No   Yes     Fast    
>> Active
>>       ge-0/0/12    Partner    No   Yes    No   No   No   Yes     Fast   
>> Passive
>>       ge-0/0/13      Actor    No   Yes    No   No   No   Yes     Fast    
>> Active
>>       ge-0/0/13    Partner    No   Yes    No   No   No   Yes     Fast   
>> Passive
>>       ge-0/0/14      Actor    No   Yes    No   No   No   Yes     Fast    
>> Active
>>       ge-0/0/14    Partner    No   Yes    No   No   No   Yes     Fast   
>> Passive
>>       ge-0/0/15      Actor    No   Yes    No   No   No   Yes     Fast    
>> Active
>>       ge-0/0/15    Partner    No   Yes    No   No   No   Yes     Fast   
>> Passive
>>     LACP protocol:        Receive State  Transmit State          Mux State
>>       ge-0/0/12               Defaulted   Fast periodic           Detached
>>       ge-0/0/13               Defaulted   Fast periodic           Detached
>>       ge-0/0/14               Defaulted   Fast periodic           Detached
>>       ge-0/0/15               Defaulted   Fast periodic           Detached
>>
>> The configuration is fairly straight forward:
>>
>> root@anto:~ # egrep aggr\|admin /usbkey/config
>> aggr0_aggr=0:14:4f:e7:39:6,0:14:4f:e7:39:7,0:14:4f:e7:39:8,0:14:4f:e7:39:9
>> aggr0_lacp_mode=active
>> # admin_nic is the nic admin_ip will be connected to for headnode zones.
>> admin_nic=aggr0
>> admin_ip=172.16.32.3
>> admin_ip6=2001:470:deeb:32::3/64
>> admin_netmask=255.255.255.0
>> admin_network=...
>> admin_gateway=172.16.32.11
>> internal_nic=aggr0
>>
>> After rebooting back into 20160804T173241Z, everything is fine again -
>> LACP comes up immediately.
>>
>> Any ideas? The changelog didn't mention anything specifically relevant
>> (searching for "lacp", "igb") that I could see...
> 
> Did you also happen to grab dladm show-aggr -x when you were on the
> system where it wasn't working?
> 
> What does dladm show-aggr -L and dladm show-aggr -x show now that you're
> on the system where it's working?
> 
> Thanks,
> Robert
> 


-------------------------------------------
smartos-discuss
Archives: https://www.listbox.com/member/archive/184463/=now
RSS Feed: https://www.listbox.com/member/archive/rss/184463/25769125-55cfbc00
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=25769125&id_secret=25769125-7688e9fb
Powered by Listbox: http://www.listbox.com

Reply via email to