Call for Presentations: NANOG 57 in Orlando, FL

2012-11-08 Thread David Temkin
NANOG Community,

I know that we all just left Dallas after NANOG 56, but the NANOG Program 
Committee is already hard at work preparing for NANOG 57 in Orlando!  

The North American Network Operators' Group (NANOG) will hold their 57th 
meeting in Orlando, FL on February 4th through the 6th.  Of special note, this 
is the first meeting that will have a fully Monday through Wednesday agenda.  
Our host, CyrusOne is eagerly awaiting welcoming you to the Renaissance Orlando 
at SeaWorld.

The NANOG Program Committee is now seeking proposals for presentations, panels, 
tutorials, tracks sessions, and keynote materials for the NANOG 57 program. We 
invite presentations highlighting issues relating to technology already 
deployed or soon-to-be deployed in the Internet. Vendors are encouraged to work 
with operators to present real-world deployment experiences with the vendor's 
products and interoperability. NANOG 57 submissions are welcome at  
http://pc.nanog.org  

For further information on what the Program Committee is seeking, please see 
http://www.nanog.org/meetings/nanog57/callforpresentations.html

This will also be our first meeting after the 2012 WCIT in early December, and 
we expect topical and timely presentations regarding the results

When considering submitting a presentation,  keep these important dates in 
mind: 

Presentation Abstracts and Draft Slides Due:  10-December-2012
Final Slides Due: 
7-January-2013
Draft Program Published:14-January-2013 
Final Agenda Published: 18-January-2013 

Please submit your materials to http://pc.nanog.org 

Looking forward to seeing everyone in Orlando! 

-Dave Temkin

Looking for a outside plant contact Zayo/AboveNet Manhattan

2012-11-08 Thread Christopher J. Pilkington
We're looking at some emergency office space in Manhattan and we identified
an AboveNet/Zayo fiber panel in the space.  Would like to see if someone
could confirm if it is viable.

Anyone from Abovenet lurking?

Thanks,
-cjp


route-views.eqix DC METRO AREA IX RENUMBERING

2012-11-08 Thread John Kemp

We have the renumber interface enabled and configured for
any known peers to make the transition for the RouteViews
EQUINIX ASHBURN route collector.

OLD PEERING ADDRESS: 206.223.115.142
NEW PEERING ADDRESS: 206.126.236.142

Current v4 peer list looks like below.  If you need to check,
telnet to route-views.eqix.routeviews.org

Thanks,
John Kemp (k...@routeviews.org)

 NeighborVAS MsgRcvd MsgSent   TblVer  InQ OutQ Up/Down 
 State/PfxRcd
 206.126.236.10  4  4589   0   0000 never   
 Active 
 206.126.236.12  4  2914   0   0000 never   
 Active 
 206.126.236.19  4  3257   0   0000 never   
 Active 
 206.126.236.24  4 11666   0   0000 never   
 Active 
 206.126.236.25  4  6079   0   0000 never   
 Active 
 206.126.236.26  4 16559   0   0000 never   
 Active 
 206.126.236.37  4  6939   0   0000 never   
 Active 
 206.126.236.47  4 19151   0   0000 never   
 Active 
 206.126.236.52  4  4565   0   0000 never   
 Active 
 206.126.236.58  4 32098   0   0000 never   
 Active 
 206.126.236.60  4  4436   0   0000 never   
 Active 
 206.126.236.61  4  4436   0   0000 never   
 Active 
 206.126.236.76  4  5769   0   0000 never   
 Active 
 206.126.236.81  4  6453   0   0000 never   
 Active 
 206.126.236.109 4 19166   0   0000 never   
 Active 
 206.126.236.120 4 41095   0   0000 never   
 Active 
 206.126.236.156 4  7795   0   0000 never   
 Active 
 206.126.236.181 4  8781   0   0000 never   
 Active 
 206.223.115.10  4  4589  1680861454000 1d00h12m  
 428226
 206.223.115.12  4  2914  3355701454000 1d00h12m  
 422014
 206.223.115.19  4  3257  3064792886000 1d00h12m  
 421751
 206.223.115.24  4 11666  3500482886000 1d00h12m  
 427034
 206.223.115.25  4  6079  1275391454000 1d00h12m  
 421048
 206.223.115.26  4 16559  1508341454000 1d00h12m  
 422475
 206.223.115.37  4  6939  2805141454000 1d00h12m  
 426934
 206.223.115.47  4 19151  1791331454000 1d00h12m  
 424028
 206.223.115.52  4  456530612886000
 1d00h12m 2058
 206.223.115.58  4 3209858551454000
 1d00h12m  957
 206.223.115.60  4  4436  2358662886000 1d00h12m  
 422375
 206.223.115.61  4  4436  2372592886000 1d00h12m  
 422375
 206.223.115.76  4  5769  2053241454000 1d00h12m  
 422482
 206.223.115.81  4  6453   0   0000 never   
 Active 
 206.223.115.109 4 19166   0   0000 never   
 Active 
 206.223.115.120 4 41095  1666041454000 1d00h12m  
 422118
 206.223.115.156 4  779531761454000
 1d00h12m  191
 206.223.115.181 4  878161571454000
 1d00h12m  764



-- 
John Kemp (k...@routeviews.org)
RouteViews Engineer
NOC: n...@routeviews.org
MAIL: h...@routeviews.org
WWW: http://www.routeviews.org




Re: MTU issues s0.wp.com

2012-11-08 Thread Tassos Chatzithomaoglou
Same here too...i don't know if having a direct peering with Edgecast will 
solve the issue.

--
Tassos

Brian Keefer wrote on 8/11/2012 05:08:
 On Nov 6, 2012, at 4:33 AM, Seth Mos wrote:

 Hi,

 Since about a week or so it's become impossible to reach wp.com content over 
 IPv6.

 IPv4 content does work fine, using the IPv6 literal returns a 404 which is 
 small enough to fit in a smaller 1480 byte MTU.

 I have another test site that has a clean 1500 byte mtu and I can fetch the 
 s0.wp.com page from there.

 It looks like tunneled IPv6 users might be in hurt here.

 Is anyone else experiencing similar issues?

 My traceroute shows they are employing a CDN for s0.wp.com, so not everyone 
 might be affected.

 7  asd2-rou-1022.NL.eurorings.net (2001:680:0:800f::291)  6.460 ms 6.203 ms  
 6.188 ms
 8  asd2-rou-1044.eurorings.net (2001:680::134:222:85:63)  6.447 ms 6.494 ms  
 6.495 ms
 9  adm-b5-link.telia.net (2001:2000:3080:6f::1)  6.818 ms  6.936 ms 6.891 ms
 10  ldn-b3-v6.telia.net (2001:2000:3018:5::1)  15.290 ms  27.481 ms 15.380 ms
 11  edgecast-ic-147468-ldn-b3.c.telia.net (2001:2000:3080:378::2) 15.116 ms  
 15.174 ms  15.176 ms
 12  2606:2800:234:1922:15a7:17bf:bb7:f09 
 (2606:2800:234:1922:15a7:17bf:bb7:f09)  15.496 ms  15.327 ms  15.460 ms

 Kind regards,

 Seth

 Exact same issue here over HE.net tunnel. I can get errors from the 
 (presumably) front-end proxy, but content stalls forever. I'm seeing this for 
 all WP related requests that go to EdgecastCDN.

 --
 chort








RE: Sandy seen costing telco, cable hundreds of millions of dollars

2012-11-08 Thread Vinny_Abello
Agreed... I live in the same general vicinity in NJ as Alex and ATT service was 
pretty much non-existent anywhere there was no power from what I experienced. I 
have friends on Verizon to whom I've spoken and they didn't seem to notice as 
large of an impact at all on their cellular service.

-Vinny

-Original Message-
From: Alex Rubenstein [mailto:a...@corp.nac.net] 
Sent: Wednesday, November 07, 2012 9:39 AM
To: 'na...@jima.tk'; 'nanog@nanog.org'
Subject: Re: Sandy seen costing telco, cable hundreds of millions of dollars

Probably ATT. Many areas of NJ had zero service from them for days. 


- Original Message -
From: Jima na...@jima.tk
To: nanog nanog@nanog.org
Sent: Wed Nov 07 09:32:25 2012
Subject: RE: Sandy seen costing telco, cable hundreds of millions of dollars

On Tuesday, 2012-11-06, Frank Bulk wrote:
 So which wireless carrier is bringing down the average to 81%?

 A quick skim of the article (again,
http://www.reuters.com/article/2012/11/01/storm-sandy-telecoms-idUSL1E8M1L9Z20121101
) makes me suspect ATT.  They're mentioned twice in other context, but
there's not a sites-online statistic for them.

 I suppose it's worth noting that this wouldn't be the first time they've
caught flak for their (in)ability to cover NYC sufficiently.

 Jima


Whats so difficult about ISSU

2012-11-08 Thread Kasper Adel
Hello,

We've been hearing about ISSU for so many years and i didnt hear that any
vendor was able to achieve it yet.

What is the technical reason behind that?

If i understand correctly, the way it will be done would be simply to have
extra ASICs/HW to be able to build dual circuits accessing the same memory,
and gracefully switch from one to another. Is that right?

Thanks,
Kim


Re: Whats so difficult about ISSU

2012-11-08 Thread Zaid Ali
Cisco Nexus platform does it pretty well so they have achieved it. 

Zaid
 
On Nov 8, 2012, at 3:22 PM, Kasper Adel wrote:

 Hello,
 
 We've been hearing about ISSU for so many years and i didnt hear that any
 vendor was able to achieve it yet.
 
 What is the technical reason behind that?
 
 If i understand correctly, the way it will be done would be simply to have
 extra ASICs/HW to be able to build dual circuits accessing the same memory,
 and gracefully switch from one to another. Is that right?
 
 Thanks,
 Kim




Re: Whats so difficult about ISSU

2012-11-08 Thread Kenneth McRae
Juniper also offers it on the EX virtual switching platform.  Works if you
have the correct version of JunOS.

On Thu, Nov 8, 2012 at 3:38 PM, Zaid Ali z...@zaidali.com wrote:

 Cisco Nexus platform does it pretty well so they have achieved it.

 Zaid

 On Nov 8, 2012, at 3:22 PM, Kasper Adel wrote:

  Hello,
 
  We've been hearing about ISSU for so many years and i didnt hear that any
  vendor was able to achieve it yet.
 
  What is the technical reason behind that?
 
  If i understand correctly, the way it will be done would be simply to
 have
  extra ASICs/HW to be able to build dual circuits accessing the same
 memory,
  and gracefully switch from one to another. Is that right?
 
  Thanks,
  Kim





Re: Whats so difficult about ISSU

2012-11-08 Thread Phil
The major vendors have figured it out for the most part by moving to stateful 
synchronization between control plane modules and implementing non-stop 
routing.  

ALU has supported ISSU on minor releases for many years and just added support 
for major releases. 

The Cisco Nexus ISSU works well, I've done an upgrade on a 5K switch and it was 
completely hitless. 

Juniper and Cisco with the 9K have gone through some hurdles but ISSU is 
actually usable now if the software versions support it.  

The main remaining hurdle is updating microcode on linecards, they still need 
to be rebooted after an upgrade.  

Phil

On Nov 8, 2012, at 6:22 PM, Kasper Adel karim.a...@gmail.com wrote:

 Hello,
 
 We've been hearing about ISSU for so many years and i didnt hear that any
 vendor was able to achieve it yet.
 
 What is the technical reason behind that?
 
 If i understand correctly, the way it will be done would be simply to have
 extra ASICs/HW to be able to build dual circuits accessing the same memory,
 and gracefully switch from one to another. Is that right?
 
 Thanks,
 Kim



Re: Whats so difficult about ISSU

2012-11-08 Thread Kasper Adel
What i was asking is full ISSU, even with micro code. I assume between
Major release there will be microcode upgrade most of the time.


On Fri, Nov 9, 2012 at 2:48 AM, Phil bedard.p...@gmail.com wrote:

 The major vendors have figured it out for the most part by moving to
 stateful synchronization between control plane modules and implementing
 non-stop routing.

 ALU has supported ISSU on minor releases for many years and just added
 support for major releases.

 The Cisco Nexus ISSU works well, I've done an upgrade on a 5K switch and
 it was completely hitless.

 Juniper and Cisco with the 9K have gone through some hurdles but ISSU is
 actually usable now if the software versions support it.

 The main remaining hurdle is updating microcode on linecards, they still
 need to be rebooted after an upgrade.

 Phil

 On Nov 8, 2012, at 6:22 PM, Kasper Adel karim.a...@gmail.com wrote:

  Hello,
 
  We've been hearing about ISSU for so many years and i didnt hear that any
  vendor was able to achieve it yet.
 
  What is the technical reason behind that?
 
  If i understand correctly, the way it will be done would be simply to
 have
  extra ASICs/HW to be able to build dual circuits accessing the same
 memory,
  and gracefully switch from one to another. Is that right?
 
  Thanks,
  Kim



Re: Whats so difficult about ISSU

2012-11-08 Thread Kenneth McRae
I have executed successfully on the MX960 with no issues.. EX on the other
hand, really depends on your version of JunOS.

On Thu, Nov 8, 2012 at 4:19 PM, Alex dreamwave...@yahoo.com wrote:

 http://www.juniper.net/**techpubs/en_US/junos/topics/**
 concept/issu-oveview.htmlhttp://www.juniper.net/techpubs/en_US/junos/topics/concept/issu-oveview.html

 The Juniper ISSU guide.

 You need two things:

 1. Separation of the control plane and  forwarding plane
 2. 2 routing engines in the same chassis -- the non active RE upgrades
 first, then when its up and running the active one goes into upgrade mode
 and control fails over to the secondary RE which is running the upgraded
 version of the software.

 I assume it works on any vendor that has 2 REs in the same chassis and the
 fwd and control planes are separated, and there is a redundancy protocol
 running between the two REs(like Graceful Switchover on Juniper gear).


 On 11/09/2012 01:42 AM, Kenneth McRae wrote:

 Juniper also offers it on the EX virtual switching platform.  Works if you
 have the correct version of JunOS.

 On Thu, Nov 8, 2012 at 3:38 PM, Zaid Ali z...@zaidali.com wrote:

  Cisco Nexus platform does it pretty well so they have achieved it.

 Zaid

 On Nov 8, 2012, at 3:22 PM, Kasper Adel wrote:

  Hello,

 We've been hearing about ISSU for so many years and i didnt hear that
 any
 vendor was able to achieve it yet.

 What is the technical reason behind that?

 If i understand correctly, the way it will be done would be simply to

 have

 extra ASICs/HW to be able to build dual circuits accessing the same

 memory,

 and gracefully switch from one to another. Is that right?

 Thanks,
 Kim





Re: Whats so difficult about ISSU

2012-11-08 Thread Kenneth McRae
I have performed micro code upgrades using ISSU on the Juniper platform.

On Thu, Nov 8, 2012 at 4:52 PM, Kasper Adel karim.a...@gmail.com wrote:

 What i was asking is full ISSU, even with micro code. I assume between
 Major release there will be microcode upgrade most of the time.


 On Fri, Nov 9, 2012 at 2:48 AM, Phil bedard.p...@gmail.com wrote:

  The major vendors have figured it out for the most part by moving to
  stateful synchronization between control plane modules and implementing
  non-stop routing.
 
  ALU has supported ISSU on minor releases for many years and just added
  support for major releases.
 
  The Cisco Nexus ISSU works well, I've done an upgrade on a 5K switch and
  it was completely hitless.
 
  Juniper and Cisco with the 9K have gone through some hurdles but ISSU is
  actually usable now if the software versions support it.
 
  The main remaining hurdle is updating microcode on linecards, they still
  need to be rebooted after an upgrade.
 
  Phil
 
  On Nov 8, 2012, at 6:22 PM, Kasper Adel karim.a...@gmail.com wrote:
 
   Hello,
  
   We've been hearing about ISSU for so many years and i didnt hear that
 any
   vendor was able to achieve it yet.
  
   What is the technical reason behind that?
  
   If i understand correctly, the way it will be done would be simply to
  have
   extra ASICs/HW to be able to build dual circuits accessing the same
  memory,
   and gracefully switch from one to another. Is that right?
  
   Thanks,
   Kim
 



Re: Whats so difficult about ISSU

2012-11-08 Thread Kasper Adel
Does that mean they are the only vendor capable of doing this today?

I am interested in the technology behind this if this is something public,
any ideas?

Thx

On Friday, November 9, 2012, Kenneth McRae wrote:

 I have performed micro code upgrades using ISSU on the Juniper platform.

 On Thu, Nov 8, 2012 at 4:52 PM, Kasper Adel 
 karim.a...@gmail.comjavascript:_e({}, 'cvml', 'karim.a...@gmail.com');
  wrote:

 What i was asking is full ISSU, even with micro code. I assume between
 Major release there will be microcode upgrade most of the time.


 On Fri, Nov 9, 2012 at 2:48 AM, Phil 
 bedard.p...@gmail.comjavascript:_e({}, 'cvml', 'bedard.p...@gmail.com');
 wrote:

  The major vendors have figured it out for the most part by moving to
  stateful synchronization between control plane modules and implementing
  non-stop routing.
 
  ALU has supported ISSU on minor releases for many years and just added
  support for major releases.
 
  The Cisco Nexus ISSU works well, I've done an upgrade on a 5K switch and
  it was completely hitless.
 
  Juniper and Cisco with the 9K have gone through some hurdles but ISSU is
  actually usable now if the software versions support it.
 
  The main remaining hurdle is updating microcode on linecards, they still
  need to be rebooted after an upgrade.
 
  Phil
 
  On Nov 8, 2012, at 6:22 PM, Kasper Adel 
  karim.a...@gmail.comjavascript:_e({}, 'cvml', 'karim.a...@gmail.com');
 wrote:
 
   Hello,
  
   We've been hearing about ISSU for so many years and i didnt hear that
 any
   vendor was able to achieve it yet.
  
   What is the technical reason behind that?
  
   If i understand correctly, the way it will be done would be simply to
  have
   extra ASICs/HW to be able to build dual circuits accessing the same
  memory,
   and gracefully switch from one to another. Is that right?
  
   Thanks,
   Kim
 





Re: Whats so difficult about ISSU

2012-11-08 Thread Oliver Garraux
I know some people here have mentioned good experiences with ISSU on
Nexus.   I don't doubt that it usually works right, but in my latest
experience with upgrading NX-OS on dual-SUP'ed 7k's, it was hitless
if, by hitless, you mean ~20% packet loss while troubleshooting with
TAC before we found that we had to remove and re-apply QoS policies
from every interface.

Also, depending on the update, linecards might have to be reset.

Oliver

-

Oliver Garraux
Check out my blog:  www.GetSimpliciti.com/blog
Follow me on Twitter:  twitter.com/olivergarraux


On Thu, Nov 8, 2012 at 8:00 PM, Kasper Adel karim.a...@gmail.com wrote:
 Does that mean they are the only vendor capable of doing this today?

 I am interested in the technology behind this if this is something public,
 any ideas?

 Thx

 On Friday, November 9, 2012, Kenneth McRae wrote:

 I have performed micro code upgrades using ISSU on the Juniper platform.

 On Thu, Nov 8, 2012 at 4:52 PM, Kasper Adel 
 karim.a...@gmail.comjavascript:_e({}, 'cvml', 'karim.a...@gmail.com');
  wrote:

 What i was asking is full ISSU, even with micro code. I assume between
 Major release there will be microcode upgrade most of the time.


 On Fri, Nov 9, 2012 at 2:48 AM, Phil 
 bedard.p...@gmail.comjavascript:_e({}, 'cvml', 'bedard.p...@gmail.com');
 wrote:

  The major vendors have figured it out for the most part by moving to
  stateful synchronization between control plane modules and implementing
  non-stop routing.
 
  ALU has supported ISSU on minor releases for many years and just added
  support for major releases.
 
  The Cisco Nexus ISSU works well, I've done an upgrade on a 5K switch and
  it was completely hitless.
 
  Juniper and Cisco with the 9K have gone through some hurdles but ISSU is
  actually usable now if the software versions support it.
 
  The main remaining hurdle is updating microcode on linecards, they still
  need to be rebooted after an upgrade.
 
  Phil
 
  On Nov 8, 2012, at 6:22 PM, Kasper Adel 
  karim.a...@gmail.comjavascript:_e({}, 'cvml', 'karim.a...@gmail.com');
 wrote:
 
   Hello,
  
   We've been hearing about ISSU for so many years and i didnt hear that
 any
   vendor was able to achieve it yet.
  
   What is the technical reason behind that?
  
   If i understand correctly, the way it will be done would be simply to
  have
   extra ASICs/HW to be able to build dual circuits accessing the same
  memory,
   and gracefully switch from one to another. Is that right?
  
   Thanks,
   Kim
 






Re: Whats so difficult about ISSU

2012-11-08 Thread Phil
Heh you will find vendors avoid using the term hitless.  I can't think of any 
router which supports ISSU that is truly hitless.  The ASR9K ISSU states it 
will sustain less than 6 seconds of loss...

ISSU is still rife with caveats and incompatibilities as well if you are doing 
more advanced things.

Phil

On Nov 8, 2012, at 8:22 PM, Oliver Garraux oli...@g.garraux.net wrote:

 I know some people here have mentioned good experiences with ISSU on
 Nexus.   I don't doubt that it usually works right, but in my latest
 experience with upgrading NX-OS on dual-SUP'ed 7k's, it was hitless
 if, by hitless, you mean ~20% packet loss while troubleshooting with
 TAC before we found that we had to remove and re-apply QoS policies
 from every interface.
 
 Also, depending on the update, linecards might have to be reset.
 
 Oliver
 
 -
 
 Oliver Garraux
 Check out my blog:  www.GetSimpliciti.com/blog
 Follow me on Twitter:  twitter.com/olivergarraux
 
 
 On Thu, Nov 8, 2012 at 8:00 PM, Kasper Adel karim.a...@gmail.com wrote:
 Does that mean they are the only vendor capable of doing this today?
 
 I am interested in the technology behind this if this is something public,
 any ideas?
 
 Thx
 
 On Friday, November 9, 2012, Kenneth McRae wrote:
 
 I have performed micro code upgrades using ISSU on the Juniper platform.
 
 On Thu, Nov 8, 2012 at 4:52 PM, Kasper Adel 
 karim.a...@gmail.comjavascript:_e({}, 'cvml', 'karim.a...@gmail.com');
 wrote:
 
 What i was asking is full ISSU, even with micro code. I assume between
 Major release there will be microcode upgrade most of the time.
 
 
 On Fri, Nov 9, 2012 at 2:48 AM, Phil 
 bedard.p...@gmail.comjavascript:_e({}, 'cvml', 
 'bedard.p...@gmail.com');
 wrote:
 
 The major vendors have figured it out for the most part by moving to
 stateful synchronization between control plane modules and implementing
 non-stop routing.
 
 ALU has supported ISSU on minor releases for many years and just added
 support for major releases.
 
 The Cisco Nexus ISSU works well, I've done an upgrade on a 5K switch and
 it was completely hitless.
 
 Juniper and Cisco with the 9K have gone through some hurdles but ISSU is
 actually usable now if the software versions support it.
 
 The main remaining hurdle is updating microcode on linecards, they still
 need to be rebooted after an upgrade.
 
 Phil
 
 On Nov 8, 2012, at 6:22 PM, Kasper Adel 
 karim.a...@gmail.comjavascript:_e({}, 'cvml', 'karim.a...@gmail.com');
 wrote:
 
 Hello,
 
 We've been hearing about ISSU for so many years and i didnt hear that
 any
 vendor was able to achieve it yet.
 
 What is the technical reason behind that?
 
 If i understand correctly, the way it will be done would be simply to
 have
 extra ASICs/HW to be able to build dual circuits accessing the same
 memory,
 and gracefully switch from one to another. Is that right?
 
 Thanks,
 Kim
 



Re: Whats so difficult about ISSU

2012-11-08 Thread Mikael Abrahamsson

On Thu, 8 Nov 2012, Phil wrote:

The major vendors have figured it out for the most part by moving to 
stateful synchronization between control plane modules and implementing 
non-stop routing.


NSR isn't ISSU.

ISSU contains the wording in service. 6 seconds of outage isn't in 
service. 0.5 seconds of outage isn't in service. I could accept a few 
microseconds of outage as being ISSU, but tenths of seconds isn't in 
service.


The main remaining hurdle is updating microcode on linecards, they still 
need to be rebooted after an upgrade.


... and as long as this is the case, there is no ISSU. There is only 
shorter outages during upgrade compared to a complete reboot.


--
Mikael Abrahamssonemail: swm...@swm.pp.se



Re: Whats so difficult about ISSU

2012-11-08 Thread Jonathan Lassoff
On Thu, Nov 8, 2012 at 8:13 PM, Mikael Abrahamsson swm...@swm.pp.se wrote:
 On Thu, 8 Nov 2012, Phil wrote:

 The major vendors have figured it out for the most part by moving to
 stateful synchronization between control plane modules and implementing
 non-stop routing.


 NSR isn't ISSU.

 ISSU contains the wording in service. 6 seconds of outage isn't in
 service. 0.5 seconds of outage isn't in service. I could accept a few
 microseconds of outage as being ISSU, but tenths of seconds isn't in
 service.


 The main remaining hurdle is updating microcode on linecards, they still
 need to be rebooted after an upgrade.


 ... and as long as this is the case, there is no ISSU. There is only
 shorter outages during upgrade compared to a complete reboot.

This.
There are some wonderfully reconfigurable router hardwares out in the
world, and platforms that can dynamically program their forwarding
hardware make this seem possible.

It's possible to build things such that portions of a single box can
be upgraded at a time. With multiple links, or forwarding-paths out to
a remote destination, it seems to me that if the upgrade process could
just coordinate things and update each piece of forwarding hardware
while letting traffic cut over and waiting for it to come back before
moving on.

I could envision a Juniper M/TX box, where MPLS FRR or an ae
interface across FPCs could take backup traffic while a PFE is
upgraded.
Of course, every possible path would need to be able to survive an FPC
being down, and the process would have to have hooks into protocols to
know when everything is switched back.



Re: Whats so difficult about ISSU

2012-11-08 Thread Juuso Lehtinen
In vendor-speak ISSU usually refers to 'minimal traffic impact' upgrade.
Definition of minimal varies from vendor to vendor and from upgrade to
upgrade, depending of which parts of the code need to be upgraded. In
general, traffic loss during ISSU is an order of magnitude less than by
reloading the whole box or line card as with conventional upgrade.

On high level, the ISSU can be divided to two areas:
* Control plane / controller card software upgrade
* Forwarding plane / line card software upgrade

Control card software upgrade is the easy part. In 1+1 controller design,
the standby controller card is upgraded first. Next, control card
switchover is performed. And last, the remaining controller card is
upgraded.

Line card upgrade is the more tricky part. On high level, the line card can
be divided into forwarding plane and control plane (yes - there is CPU
complex on line cards as well). The control plane part of the line card can
be upgraded separately and then restarted. If line-card CPU is responsible
for generating OSPF hellos, the OSPF session might time out during the
restart. However, for most protocols, graceful restart extensions help over
any such issues. While the control plane is rebooting, the forwarding bits
on the line card continue packet forwarding.

The forwarding plane upgrade of the line card is the tricky part. This is
the part that will cause the 'short outage' during ISSU. If the code
upgrade needs to touch microcode or FPGA code, you will be seeing some
traffic loss. It is just the way these chips are built - you cannot
reprogram FPGA without taking the FPGA out of service first. The same
applies to network processors as well.

In theory you could duplicate these forwarding plane chips on line cards
and implement simple switch before the PHY. However, I doubt if any vendor
has gone this way as it would push line card prices much higher.

If your SLAs are built so that no packet loss is acceptable, you need to
work around the ISSU limitations:
* Use line-level protection on adjacent line cards (LAG, APS1+1, MSP1+1) -
when primary card goes down, the backup card will carry the traffic
* When upgrading a transit router, route traffic via redundant path before
starting transit router upgrade

BR,
 Juuso

 is such that no traffic loss whatsoever is acceptable, be sure to


On Thu, Nov 8, 2012 at 3:22 PM, Kasper Adel karim.a...@gmail.com wrote:

 Hello,

 We've been hearing about ISSU for so many years and i didnt hear that any
 vendor was able to achieve it yet.

 What is the technical reason behind that?

 If i understand correctly, the way it will be done would be simply to have
 extra ASICs/HW to be able to build dual circuits accessing the same memory,
 and gracefully switch from one to another. Is that right?

 Thanks,
 Kim



Re: Whats so difficult about ISSU

2012-11-08 Thread Saku Ytti
On (2012-11-09 01:22 +0200), Kasper Adel wrote:

 We've been hearing about ISSU for so many years and i didnt hear that any
 vendor was able to achieve it yet.
 
 What is the technical reason behind that?

I'd say generally code quality in routers is really really bad, I'm not
sure why this is.
I think one problem is, that we start on premise that code will be written
correctly. When we start on that premise, we can do silly things like write
run-to-completion operating systems like IOS and JunOS (rpd). Which means
single guy making one bad judgement call, and whole OS is bad.

Of course run-to-completion is most optimum way to execute code, if your
code is flawless, but that ship has sailed. Possibly when IOS started CPU
time was premium and it was cheaper to through code review money at the
problem. 
But today it clearly is cheaper to add power to control plane and have
levels of abstraction in control-plane which saves the system from bad
code, i.e. design your control-plane assuming code you deliver isn't good.

Take a page from erlang team on design principles. I think Arista is
walking the right path. They have (hopefully) stable and simplistic
state-storage process, from which separate processes can download their
states when they crash, which can make crashing virtually transparent to
operator.
However I think Arista is still running single BGPd etc, I think you should
at least rung iBGP and eBGP or maybe even peer gruops in different daemons,
so when you get bad UPDATE, it'll crash your eBGPs or one peer-group,
instead of all neighbours. Or of course if you keep TCP state and various
bgp RIBs in separate location, you won't need to tear down the TCP just
because you crash.

Someone might argue the overhead is too large, but is it though? MX routers
ship with 4 cores RP, out of which you're using 1 core. The overhead isn't
that high.

Some people write positive things about ISSU in reply, only box where I've
seen it work reliably is CAT4500 switches. I've not seen it working in
routers. On MX960 my personal hit miss ratio is like 4/5 ISSU work, 1/5
have failed catastrophically, like suddenly PFE is dropping packets as if
FW filter was applied, while none is. So we've stopped using ISSU.
Point of ISSU is, you're not doing change management notices to your
customers, so then it positively has to work, or you're in breach of
contract.

-- 
  ++ytti