Re: [c-nsp] ASR920 randomly loosing layer-2 on a port
We upgraded the 920 to 16.12.06 this morning. No change. Still not learning MAC addresses on port te0/0/4. So, back to the drawing board. On Mon, Jul 11, 2022 at 1:31 PM Gert Doering wrote: > Hi, > > On Mon, Jul 11, 2022 at 04:47:57PM +, Brian Turnbow wrote: > > > On Mon, Jul 11, 2022 at 03:59:02PM +, Brian Turnbow wrote: > > > > Yep, sounds like the infamous uptime over 2 years "feature" from 3.16 > > > (something).. > > > > Reboot and upgrade was the only way we fixed it > > > > > > Uh. Could you elaborate on what that "feature" is, exactly? > > > > It was the bug where after after two years of uptime > > If an interface went down it would stick as up and not pass traffic > > You could not provision new interfaces. > > Counters also stopped working. (we used this to find affected units) > > Now THAT is interesting. I'm a bit further distanced from day-to-day > operations these days (otherwise I might have noticed), but indeed, > counters didn't work anymore either. "No traffic on this box!" which > I know to be not true (our daily TSM backups go through there...) - and > after reboot, "Traffic!". > > Very interesting. "Interface itself" counters are all "0", but service > instance counters (gi0/0/2 si 90) still show traffic. So that's actually > something our alarming could trigger on "si has > 1 Mbit, interface itself > has 0"... > > [..] > > Sounds like it may be different. > > Did the counters work? > > Maybe they decided to add it into 16.06 , you never know what a BU may > decide is a must have feature > > Obviously, 16.06 has much improved performance, so 2-year-bugs are now > hit after 0.5 years already! > > OTOH... seems it wasn't actually 27 weeks uptime, but quite a bit more, > which was just distorted by SNMP uptime wrapping (and our prometheus > instance not properly distinguishing this for old data, it only recently > learned to query that other OID). > > So, definitely more than 2 years, and traffic counters stopped some 5 > months > ago... and we did not try to actually bring up anything new since then. > > Yeah, thanks a lot for this information. This will be very helpful to > avoid needless frustration by our on-site people ("it does not link! can > you please try a different cable? did you get the patch right?"). > > gert > > -- > "If was one thing all people took for granted, was conviction that if you > feed honest figures into a computer, honest figures come out. Never > doubted > it myself till I met a computer with a sense of humor." > Robert A. Heinlein, The Moon is a Harsh > Mistress > > Gert Doering - Munich, Germany > g...@greenie.muc.de > ___ cisco-nsp mailing list cisco-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/cisco-nsp archive at http://puck.nether.net/pipermail/cisco-nsp/
Re: [c-nsp] ASR920 randomly loosing layer-2 on a port
Hi, On Mon, Jul 11, 2022 at 04:47:57PM +, Brian Turnbow wrote: > > On Mon, Jul 11, 2022 at 03:59:02PM +, Brian Turnbow wrote: > > > Yep, sounds like the infamous uptime over 2 years "feature" from 3.16 > > (something).. > > > Reboot and upgrade was the only way we fixed it > > > > Uh. Could you elaborate on what that "feature" is, exactly? > > It was the bug where after after two years of uptime > If an interface went down it would stick as up and not pass traffic > You could not provision new interfaces. > Counters also stopped working. (we used this to find affected units) Now THAT is interesting. I'm a bit further distanced from day-to-day operations these days (otherwise I might have noticed), but indeed, counters didn't work anymore either. "No traffic on this box!" which I know to be not true (our daily TSM backups go through there...) - and after reboot, "Traffic!". Very interesting. "Interface itself" counters are all "0", but service instance counters (gi0/0/2 si 90) still show traffic. So that's actually something our alarming could trigger on "si has > 1 Mbit, interface itself has 0"... [..] > Sounds like it may be different. > Did the counters work? > Maybe they decided to add it into 16.06 , you never know what a BU may decide > is a must have feature Obviously, 16.06 has much improved performance, so 2-year-bugs are now hit after 0.5 years already! OTOH... seems it wasn't actually 27 weeks uptime, but quite a bit more, which was just distorted by SNMP uptime wrapping (and our prometheus instance not properly distinguishing this for old data, it only recently learned to query that other OID). So, definitely more than 2 years, and traffic counters stopped some 5 months ago... and we did not try to actually bring up anything new since then. Yeah, thanks a lot for this information. This will be very helpful to avoid needless frustration by our on-site people ("it does not link! can you please try a different cable? did you get the patch right?"). gert -- "If was one thing all people took for granted, was conviction that if you feed honest figures into a computer, honest figures come out. Never doubted it myself till I met a computer with a sense of humor." Robert A. Heinlein, The Moon is a Harsh Mistress Gert Doering - Munich, Germany g...@greenie.muc.de signature.asc Description: PGP signature ___ cisco-nsp mailing list cisco-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/cisco-nsp archive at http://puck.nether.net/pipermail/cisco-nsp/
Re: [c-nsp] ASR920 randomly loosing layer-2 on a port
Hello, On Mon, 11 Jul 2022 at 18:20, Adrian Minta wrote: > Yes, this is one of the bugs in 3.x trains. The solution is to upgrade > to something like 16.12.x. Well, we don't really know what the solution is, unless someone is actually running a significant number of boxes of previously affected hardware with the latest release Cisco claims fixes this issue for at least 890 days without issues. Which is impossible, since Cisco's latest claim about a fix for CSCvk35460 / CSCvw93411 is 16.12.6 (not 16.12.x), which was only released in September 2021, so the early massive adopters of this release will know in ... February 2024. It's too easy for an engineer to say: hey you know what, we updated low level firmware in the release published last week, I'm sure this is related, why don't you upgrade to that release and let me know how things go ... in 890 days (necessarily). This issue was previously discussed here in 2019 [1]. I assume the platform will be EOLed before we actually know for sure, wouldn't be the first time [2]. cheers, lukas [1] https://www.mail-archive.com/cisco-nsp@puck.nether.net/msg66833.html [2] https://tools.cisco.com/security/center/content/CiscoSecurityAdvisory/Cisco-SA-20140828-CVE-2014-3347 ___ cisco-nsp mailing list cisco-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/cisco-nsp archive at http://puck.nether.net/pipermail/cisco-nsp/
Re: [c-nsp] ASR920 randomly loosing layer-2 on a port
I don't believe that my issue is uptime related. A cold-boot of the router didn't fix anything. I am going to work on upgrading the IOS and see what happens. On Mon, Jul 11, 2022 at 12:50 PM Adrian Minta wrote: > > On 7/11/22 19:31, Shawn L wrote: > > A-ha. I was still on 3.18.06. I'll try that > > > > Shawn > > > > On Mon, Jul 11, 2022 at 12:26 PM Adrian Minta > > wrote: > > > > > > Yes, this is one of the bugs in 3.x trains. The solution is to > > upgrade > > to something like 16.12.x. > > > > -- > > Best regards, > > Adrian Minta > > > > > > > > Please be aware about some things: > > - the flash filesystem will be upgraded (no easy downgrade) > > - the reboot will takes around 25 minutes > > - in some rare cases a cold reboot may be required > > -- > Best regards, > Adrian Minta > > ___ > cisco-nsp mailing list cisco-nsp@puck.nether.net > https://puck.nether.net/mailman/listinfo/cisco-nsp > archive at http://puck.nether.net/pipermail/cisco-nsp/ > ___ cisco-nsp mailing list cisco-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/cisco-nsp archive at http://puck.nether.net/pipermail/cisco-nsp/
Re: [c-nsp] ASR920 randomly loosing layer-2 on a port
On 7/11/22 19:31, Shawn L wrote: A-ha. I was still on 3.18.06. I'll try that Shawn On Mon, Jul 11, 2022 at 12:26 PM Adrian Minta wrote: Yes, this is one of the bugs in 3.x trains. The solution is to upgrade to something like 16.12.x. -- Best regards, Adrian Minta Please be aware about some things: - the flash filesystem will be upgraded (no easy downgrade) - the reboot will takes around 25 minutes - in some rare cases a cold reboot may be required -- Best regards, Adrian Minta ___ cisco-nsp mailing list cisco-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/cisco-nsp archive at http://puck.nether.net/pipermail/cisco-nsp/
Re: [c-nsp] ASR920 randomly loosing layer-2 on a port
Hi, On Mon, Jul 11, 2022 at 03:59:02PM +, Brian Turnbow wrote: > Yep, sounds like the infamous uptime over 2 years "feature" from 3.16 > (something).. > Reboot and upgrade was the only way we fixed it Uh. Could you elaborate on what that "feature" is, exactly? We recently had an ASR920-12CZ which stubbornly refused to establish link on any 1GE ports ("as if no cable was connected"), though *existing* links worked just fine. After a reboot, all ports back to normal. This was on 16.06.05a - but the uptime was only 27.4 weeks, our monitoring says... - so, maybe a different "feature"... gert -- "If was one thing all people took for granted, was conviction that if you feed honest figures into a computer, honest figures come out. Never doubted it myself till I met a computer with a sense of humor." Robert A. Heinlein, The Moon is a Harsh Mistress Gert Doering - Munich, Germany g...@greenie.muc.de signature.asc Description: PGP signature ___ cisco-nsp mailing list cisco-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/cisco-nsp archive at http://puck.nether.net/pipermail/cisco-nsp/
Re: [c-nsp] ASR920 randomly loosing layer-2 on a port
A-ha. I was still on 3.18.06. I'll try that Shawn On Mon, Jul 11, 2022 at 12:26 PM Adrian Minta wrote: > > On 7/11/22 16:22, Shawn L wrote: > > I have a strange one. I have a ASR-920-4SZ ( 2 copper ports, 4 10-gig > sfp > > ports all licensed). > > > > > > A day or 2 ago, the connection dropped and we're back to the same > situation > > again. Link is up, but not learning mac addresses from te0/0/4. Nothing > > has changed (which we verified) since we got the circuit working the > first > > time. Bouncing the interface, going back to auto negotiate, etc. doesn't > > seem to help. > > > > Wondering if anyone's seen this before or has any ideas. I know the > asr920 > > is 'fun' and a 1-gig sfp in a 1-gig/10-gig slot isn't the greatest idea > > (thinking of replacing it with something with more copper ports), but I'm > > trying to figure out why it worked before and suddenly stopped in the > > meantime. > > > > > Yes, this is one of the bugs in 3.x trains. The solution is to upgrade > to something like 16.12.x. > > -- > Best regards, > Adrian Minta > > > ___ > cisco-nsp mailing list cisco-nsp@puck.nether.net > https://puck.nether.net/mailman/listinfo/cisco-nsp > archive at http://puck.nether.net/pipermail/cisco-nsp/ > ___ cisco-nsp mailing list cisco-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/cisco-nsp archive at http://puck.nether.net/pipermail/cisco-nsp/
Re: [c-nsp] ASR920 randomly loosing layer-2 on a port
On 7/11/22 16:22, Shawn L wrote: I have a strange one. I have a ASR-920-4SZ ( 2 copper ports, 4 10-gig sfp ports all licensed). A day or 2 ago, the connection dropped and we're back to the same situation again. Link is up, but not learning mac addresses from te0/0/4. Nothing has changed (which we verified) since we got the circuit working the first time. Bouncing the interface, going back to auto negotiate, etc. doesn't seem to help. Wondering if anyone's seen this before or has any ideas. I know the asr920 is 'fun' and a 1-gig sfp in a 1-gig/10-gig slot isn't the greatest idea (thinking of replacing it with something with more copper ports), but I'm trying to figure out why it worked before and suddenly stopped in the meantime. Yes, this is one of the bugs in 3.x trains. The solution is to upgrade to something like 16.12.x. -- Best regards, Adrian Minta ___ cisco-nsp mailing list cisco-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/cisco-nsp archive at http://puck.nether.net/pipermail/cisco-nsp/