[c-nsp] 3750 taking errors on multiple ports

2022-07-11 Thread Randy Bush
some hours back, an antique Cisco WS-C3750G-POE-48 started taking output
errors on multiple ports.  they are generally vlan edge, a la

interface GigabitEthernet1/0/31
 description x.sea eth1
 switchport access vlan 10

shows output drops

antique.sea#show interfaces GigabitEthernet1/0/31
GigabitEthernet1/0/31 is up, line protocol is up (connected) 
  Hardware is Gigabit Ethernet, address is c40a.cb51.4d9f (bia c40a.cb51.4d9f)
  Description: x.sea eth1
  MTU 9000 bytes, BW 100 Kbit/sec, DLY 10 usec, 
 reliability 255/255, txload 5/255, rxload 2/255
  Encapsulation ARPA, loopback not set
  Keepalive set (10 sec)
  Full-duplex, 1000Mb/s, media type is 10/100/1000BaseTX
  input flow-control is off, output flow-control is unsupported 
  ARP type: ARPA, ARP Timeout 04:00:00
  Last input never, output 00:00:01, output hang never
  Last clearing of "show interface" counters never
  Input queue: 0/75/0/0 (size/max/drops/flushes); Total output drops: 1532
  Queueing strategy: fifo
  Output queue: 0/40 (size/max)
  5 minute input rate 11292000 bits/sec, 689 packets/sec
  5 minute output rate 22658000 bits/sec, 765 packets/sec
 2153269 packets input, 3859557279 bytes, 0 no buffer
 Received 68 broadcasts (4 multicasts)
 0 runts, 0 giants, 0 throttles
 0 input errors, 0 CRC, 0 frame, 0 overrun, 0 ignored
 0 watchdog, 4 multicast, 0 pause input
 0 input packets with dribble condition detected
 2416117 packets output, 9034236494 bytes, 0 underruns
 0 output errors, 0 collisions, 1 interface resets
 0 unknown protocol drops
 0 babbles, 0 late collision, 0 deferred
 0 lost carrier, 0 no carrier, 0 pause output
 0 output buffer failures, 0 output buffers swapped out

any clues on how to debug?  thanks.

randy
___
cisco-nsp mailing list  cisco-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/cisco-nsp
archive at http://puck.nether.net/pipermail/cisco-nsp/


Re: [c-nsp] ASR920 randomly loosing layer-2 on a port

2022-07-11 Thread Gert Doering
Hi,

On Mon, Jul 11, 2022 at 04:47:57PM +, Brian Turnbow wrote:
> > On Mon, Jul 11, 2022 at 03:59:02PM +, Brian Turnbow wrote:
> > > Yep, sounds like the infamous uptime over 2 years "feature" from 3.16
> > (something)..
> > > Reboot and upgrade was the only way we fixed it
> > 
> > Uh.  Could you elaborate on what that "feature" is, exactly?
> 
> It was the bug where after after two years of uptime
> If an interface went down it would stick as up and not pass traffic
> You could not provision new interfaces.
> Counters also stopped working. (we used this to find affected units)

Now THAT is interesting.  I'm a bit further distanced from day-to-day
operations these days (otherwise I might have noticed), but indeed,
counters didn't work anymore either.  "No traffic on this box!" which
I know to be not true (our daily TSM backups go through there...) - and
after reboot, "Traffic!".

Very interesting.  "Interface itself" counters are all "0", but service
instance counters (gi0/0/2 si 90) still show traffic.  So that's actually
something our alarming could trigger on "si has > 1 Mbit, interface itself
has 0"...

[..]
> Sounds like it may be different.
> Did the counters work?
> Maybe they decided to add it into 16.06 , you never know what a BU may decide 
> is a must have feature

Obviously, 16.06 has much improved performance, so 2-year-bugs are now 
hit after 0.5 years already!

OTOH... seems it wasn't actually 27 weeks uptime, but quite a bit more,
which was just distorted by SNMP uptime wrapping (and our prometheus
instance not properly distinguishing this for old data, it only recently
learned to query that other OID).

So, definitely more than 2 years, and traffic counters stopped some 5 months
ago...  and we did not try to actually bring up anything new since then.

Yeah, thanks a lot for this information.  This will be very helpful to
avoid needless frustration by our on-site people ("it does not link! can
you please try a different cable?  did you get the patch right?").

gert

-- 
"If was one thing all people took for granted, was conviction that if you 
 feed honest figures into a computer, honest figures come out. Never doubted 
 it myself till I met a computer with a sense of humor."
 Robert A. Heinlein, The Moon is a Harsh Mistress

Gert Doering - Munich, Germany g...@greenie.muc.de


signature.asc
Description: PGP signature
___
cisco-nsp mailing list  cisco-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/cisco-nsp
archive at http://puck.nether.net/pipermail/cisco-nsp/


Re: [c-nsp] ASR920 randomly loosing layer-2 on a port

2022-07-11 Thread Lukas Tribus
Hello,

On Mon, 11 Jul 2022 at 18:20, Adrian Minta  wrote:
> Yes, this is one of the bugs in 3.x trains. The solution is to upgrade
> to something like 16.12.x.

Well, we don't really know what the solution is, unless someone is
actually running a significant number of boxes of previously affected
hardware with the latest release Cisco claims fixes this issue for at
least 890 days without issues.

Which is impossible, since Cisco's latest claim about a fix for
CSCvk35460 / CSCvw93411 is 16.12.6 (not 16.12.x), which was only
released in September 2021, so the early massive adopters of this
release will know in ... February 2024.

It's too easy for an engineer to say: hey you know what, we updated
low level firmware in the release published last week, I'm sure this
is related, why don't you upgrade to that release and let me know how
things go ... in 890 days (necessarily).


This issue was previously discussed here in 2019 [1]. I assume the
platform will be EOLed before we actually know for sure, wouldn't be
the first time [2].


cheers,
lukas


[1] https://www.mail-archive.com/cisco-nsp@puck.nether.net/msg66833.html
[2] 
https://tools.cisco.com/security/center/content/CiscoSecurityAdvisory/Cisco-SA-20140828-CVE-2014-3347
___
cisco-nsp mailing list  cisco-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/cisco-nsp
archive at http://puck.nether.net/pipermail/cisco-nsp/


Re: [c-nsp] ASR920 randomly loosing layer-2 on a port

2022-07-11 Thread Shawn L
I don't believe that my issue is uptime related.  A cold-boot of the router
didn't fix anything.  I am going to work on upgrading the IOS and see what
happens.



On Mon, Jul 11, 2022 at 12:50 PM Adrian Minta 
wrote:

>
> On 7/11/22 19:31, Shawn L wrote:
> > A-ha.  I was still on 3.18.06.  I'll try that
> >
> > Shawn
> >
> > On Mon, Jul 11, 2022 at 12:26 PM Adrian Minta 
> > wrote:
> >
> >
> > Yes, this is one of the bugs in 3.x trains. The solution is to
> > upgrade
> > to something like 16.12.x.
> >
> > --
> > Best regards,
> > Adrian Minta
> >
> >
> >
>
> Please be aware about some things:
>
> - the flash filesystem will be upgraded (no easy downgrade)
>
> - the reboot will takes around 25 minutes
>
> - in some rare cases a cold reboot may be required
>
> --
> Best regards,
> Adrian Minta
>
> ___
> cisco-nsp mailing list  cisco-nsp@puck.nether.net
> https://puck.nether.net/mailman/listinfo/cisco-nsp
> archive at http://puck.nether.net/pipermail/cisco-nsp/
>
___
cisco-nsp mailing list  cisco-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/cisco-nsp
archive at http://puck.nether.net/pipermail/cisco-nsp/


Re: [c-nsp] ASR920 randomly loosing layer-2 on a port

2022-07-11 Thread Adrian Minta


On 7/11/22 19:31, Shawn L wrote:

A-ha.  I was still on 3.18.06.  I'll try that

Shawn

On Mon, Jul 11, 2022 at 12:26 PM Adrian Minta  
wrote:



Yes, this is one of the bugs in 3.x trains. The solution is to
upgrade
to something like 16.12.x.

-- 
Best regards,

Adrian Minta





Please be aware about some things:

- the flash filesystem will be upgraded (no easy downgrade)

- the reboot will takes around 25 minutes

- in some rare cases a cold reboot may be required

--
Best regards,
Adrian Minta

___
cisco-nsp mailing list  cisco-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/cisco-nsp
archive at http://puck.nether.net/pipermail/cisco-nsp/


Re: [c-nsp] ASR920 randomly loosing layer-2 on a port

2022-07-11 Thread Gert Doering
Hi,

On Mon, Jul 11, 2022 at 03:59:02PM +, Brian Turnbow wrote:
> Yep, sounds like the infamous uptime over 2 years "feature" from 3.16 
> (something)..
> Reboot and upgrade was the only way we fixed it

Uh.  Could you elaborate on what that "feature" is, exactly?

We recently had an ASR920-12CZ which stubbornly refused to establish
link on any 1GE ports ("as if no cable was connected"), though *existing*
links worked just fine.  After a reboot, all ports back to normal.

This was on 16.06.05a - but the uptime was only 27.4 weeks, our
monitoring says... - so, maybe a different "feature"...

gert
-- 
"If was one thing all people took for granted, was conviction that if you 
 feed honest figures into a computer, honest figures come out. Never doubted 
 it myself till I met a computer with a sense of humor."
 Robert A. Heinlein, The Moon is a Harsh Mistress

Gert Doering - Munich, Germany g...@greenie.muc.de


signature.asc
Description: PGP signature
___
cisco-nsp mailing list  cisco-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/cisco-nsp
archive at http://puck.nether.net/pipermail/cisco-nsp/


Re: [c-nsp] ASR920 randomly loosing layer-2 on a port

2022-07-11 Thread Shawn L
A-ha.  I was still on 3.18.06.  I'll try that

Shawn

On Mon, Jul 11, 2022 at 12:26 PM Adrian Minta 
wrote:

>
> On 7/11/22 16:22, Shawn L wrote:
> > I have a strange one.  I have a ASR-920-4SZ ( 2 copper ports, 4 10-gig
> sfp
> > ports all licensed).
> >
> >
> > A day or 2 ago, the connection dropped and we're back to the same
> situation
> > again.  Link is up, but not learning mac addresses from te0/0/4.  Nothing
> > has changed (which we verified) since we got the circuit working the
> first
> > time.  Bouncing the interface, going back to auto negotiate, etc. doesn't
> > seem to help.
> >
> > Wondering if anyone's seen this before or has any ideas.  I know the
> asr920
> > is 'fun' and a 1-gig sfp in a 1-gig/10-gig slot isn't the greatest idea
> > (thinking of replacing it with something with more copper ports), but I'm
> > trying to figure out why it worked before and suddenly stopped in the
> > meantime.
> >
> >
> Yes, this is one of the bugs in 3.x trains. The solution is to upgrade
> to something like 16.12.x.
>
> --
> Best regards,
> Adrian Minta
>
>
> ___
> cisco-nsp mailing list  cisco-nsp@puck.nether.net
> https://puck.nether.net/mailman/listinfo/cisco-nsp
> archive at http://puck.nether.net/pipermail/cisco-nsp/
>
___
cisco-nsp mailing list  cisco-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/cisco-nsp
archive at http://puck.nether.net/pipermail/cisco-nsp/


Re: [c-nsp] ASR920 randomly loosing layer-2 on a port

2022-07-11 Thread Adrian Minta



On 7/11/22 16:22, Shawn L wrote:

I have a strange one.  I have a ASR-920-4SZ ( 2 copper ports, 4 10-gig sfp
ports all licensed).


A day or 2 ago, the connection dropped and we're back to the same situation
again.  Link is up, but not learning mac addresses from te0/0/4.  Nothing
has changed (which we verified) since we got the circuit working the first
time.  Bouncing the interface, going back to auto negotiate, etc. doesn't
seem to help.

Wondering if anyone's seen this before or has any ideas.  I know the asr920
is 'fun' and a 1-gig sfp in a 1-gig/10-gig slot isn't the greatest idea
(thinking of replacing it with something with more copper ports), but I'm
trying to figure out why it worked before and suddenly stopped in the
meantime.


Yes, this is one of the bugs in 3.x trains. The solution is to upgrade 
to something like 16.12.x.


--
Best regards,
Adrian Minta


___
cisco-nsp mailing list  cisco-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/cisco-nsp
archive at http://puck.nether.net/pipermail/cisco-nsp/


[c-nsp] ASR920 randomly loosing layer-2 on a port

2022-07-11 Thread Shawn L
I have a strange one.  I have a ASR-920-4SZ ( 2 copper ports, 4 10-gig sfp
ports all licensed).

In one of the 10gig sfp ports I have a cisco copper SFP.  The
interface configuration is really basic

interface TenGigabitEthernet0/0/4
 description P2P Connection to 
 no ip address
 no negotiation auto
 service instance 58 ethernet
encapsulation untagged
bridge-domain 58

It's a point-to-point (really just a vlan) to another site.  When we turned
it up, it didn't want to work right away -- I could see mac addresses from
the remote side, but not the local.  Nothing I did seemed to make it want
to learn mac addresses.  Shut/no-shut, plugging a laptop into it, verifying
SFP was supported and worked in another router, etc.  In all cases the link
would come up, but no layer-2 traffic.

The last thing that I did was to add the no negotiation auto to the
interface thinking SFP-strangeness, toggle what you can, whatever.  At that
point we got side-tracked and when we looked at it again an hour later it
was all working correctly.  So we left it as it was.

A day or 2 ago, the connection dropped and we're back to the same situation
again.  Link is up, but not learning mac addresses from te0/0/4.  Nothing
has changed (which we verified) since we got the circuit working the first
time.  Bouncing the interface, going back to auto negotiate, etc. doesn't
seem to help.

Wondering if anyone's seen this before or has any ideas.  I know the asr920
is 'fun' and a 1-gig sfp in a 1-gig/10-gig slot isn't the greatest idea
(thinking of replacing it with something with more copper ports), but I'm
trying to figure out why it worked before and suddenly stopped in the
meantime.


Thanks
___
cisco-nsp mailing list  cisco-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/cisco-nsp
archive at http://puck.nether.net/pipermail/cisco-nsp/