Re: [OSPF] FW: Solicit feedbacks on draft-dong-ospf-maxage-flush-problem-statement

2016-10-13 Thread lizhenqi...@chinamobile.com
Speaking as a co-author from network operator,

After the discussion we have reached the common point listed in my previous 
mail. Put it here again.
Production network running OSPF DOES have some problems due to software 
implementation bugs or hardware defects. Those production network problems 
deserve some proposals both to identify the router with bugs and to mitigate 
the problem, for example to reduce th impact of OSPF route flapping.
We are responsible for the network to run robustly, not for the router with 
bugs.

So, I support this doc to be adopted.



lizhenqi...@chinamobile.com
 
From: Acee Lindem (acee)
Date: 2016-10-12 02:51
To: OSPF WG List
Subject: [OSPF] FW: Solicit feedbacks on 
draft-dong-ospf-maxage-flush-problem-statement
Speaking as WG Co-Chair:

We had a quite a lengthy discussion on this problem and whether or it is 
something the WG should adopt. Please indicate whether or not you would support 
WG adoption before Oct 26th, 2016.  

Thanks,
Acee 

From: "lizhenqi...@chinamobile.com" 
Date: Thursday, August 25, 2016 at 9:29 PM
To: Acee Lindem , Jie Dong , "Les Ginsberg 
(ginsberg)" , OSPF WG List 
Cc: "Zhangxudong (zhangxudong, VRP)" 
Subject: Re: Re: [OSPF] Solicit feedbacks on 
draft-dong-ospf-maxage-flush-problem-statement

Hi Acee,

Totally agree with you that we have to avoid significant modification to OSPF. 

The common point after the mail discussion is production network running OSPF 
DOES have some problems due to software implementation bugs or hardware 
defects. Those production network problems deserve some proposals both to 
identify the router with bugs and to mitigate the problem, for example to 
reduce th impact of OSPF route flapping.

Your suggestion is one option about defective router identification. Thank you 
very much.

Best Regards,


lizhenqi...@chinamobile.com
 
From: Acee Lindem (acee)
Date: 2016-08-25 03:04
To: lizhenqi...@chinamobile.com; Dongjie (Jimmy); Les Ginsberg (ginsberg); 
ospf@ietf.org
CC: Zhangxudong (zhangxudong, VRP)
Subject: Re: [OSPF] Solicit feedbacks on 
draft-dong-ospf-maxage-flush-problem-statement
Speaking as WG member:

Hi Zhenjiang,

I don’t doubt that this was a very disquieting experience. However, I still 
don’t think we should attempt to change the protocol to compensate for routers 
that do not adhere to the protocol. To make an analogy, in my years of OSPF 
experience I’ve been subject to a number of bugs related to OSPF’s usage of 
local wire multicast (some triggered by obscure conditions such as routing and 
bridging on the same port). However, I’ve never proposed to not use local wire 
multicast. Also, after 25 years of OSPFv2, it doesn’t make sense to try and 
change the protocol to avoid bugs in this area. As for identifying the 
nefarious router, I think adding a counter and possibly a separate notification 
to the YANG model might be warranted since purging a non-self-originated LSA 
should not be a common occurrence in most networks. 

Thanks, 
Acee
P.S. Since this is an OSPF standards list, I’ve purposely avoided the questions 
as to how this catastrophic bug made it into a production network. 


From: "lizhenqi...@chinamobile.com" 
Date: Wednesday, August 24, 2016 at 2:11 PM
To: Jie Dong , Acee Lindem , "Les Ginsberg 
(ginsberg)" , OSPF WG List 
Cc: "Zhangxudong (zhangxudong, VRP)" 
Subject: Re: RE: [OSPF] Solicit feedbacks on 
draft-dong-ospf-maxage-flush-problem-statement

Hello Jie, Acee and Les,

I am a coauthor of this draft from operator China Mobile. Thank you all for 
your discussion and suggestion in the previous mails. As you all discussed, a 
misbehavior OSPF router (due to software or hardware problem) can cause severe 
problem in the whole OSPF domain. 

Here I want to point out that OSPF route flapping DID occour in my field 
network contributed by a misbehavior OSPF router installed. The procedure to 
analyze and look for the cause were very complicated because we did not know 
the source of the flushing. Two hours past, we could not identify the real 
cause and restore our network. The CPU utilization of OSPF routers was high, 
the network traffic decreased significantly, lots of tunnel down warnings 
raised. When we tried to shutdown one OSPF router, route flapping stopped. This 
router was a newly deployed one. Through communication with our vendor, they 
admitted that this product had some defects in dealing with OSPF protocol. This 
kind of defects are difficult for us to test  when they apply for entrance in 
our network. Once defective products are deployed in the field network,  
locating the problem is very hard and time consuming. 

So, I think it is necessary for us to solve the problem and improve the 
robustness of the protocol. At least it should provide the means to help us 
locate the OSPF route flapping problem.



lizhenqi...@chinamobile.com
 
From: Dongjie (Jimmy)
Date: 2016-08-18 17:09
To: Acee Lindem (acee); Les Ginsberg (ginsberg); ospf@ietf.org
CC: Zhangxudong (zhangxudong, VR

Re: [OSPF] FW: Solicit feedbacks on draft-dong-ospf-maxage-flush-problem-statement

2016-10-12 Thread Peter Psenak
No support. We should not modify protocol to address possible bugs in 
the implementation.


thanks,
Peter

On 11/10/16 20:51 , Acee Lindem (acee) wrote:

Speaking as WG Co-Chair:

We had a quite a lengthy discussion on this problem and whether or it is
something the WG should adopt. Please indicate whether or not you would
support WG adoption before Oct 26th, 2016.

Thanks,
Acee

From: "lizhenqi...@chinamobile.com "
mailto:lizhenqi...@chinamobile.com>>
Date: Thursday, August 25, 2016 at 9:29 PM
To: Acee Lindem mailto:a...@cisco.com>>, Jie Dong
mailto:jie.d...@huawei.com>>, "Les Ginsberg
(ginsberg)" mailto:ginsb...@cisco.com>>, OSPF WG
List mailto:ospf@ietf.org>>
Cc: "Zhangxudong (zhangxudong, VRP)" mailto:zhangxud...@huawei.com>>
Subject: Re: Re: [OSPF] Solicit feedbacks on
draft-dong-ospf-maxage-flush-problem-statement

Hi Acee,

Totally agree with you that we have to avoid significant
modification to OSPF.

The common point after the mail discussion is production network
running OSPF DOES have some problems due to software implementation
bugs or hardware defects. Those production network problems deserve
some proposals both to identify the router with bugs and to mitigate
the problem, for example to reduce th impact of OSPF route flapping.

Your suggestion is one option about defective router identification.
Thank you very much.

Best Regards,

lizhenqi...@chinamobile.com 

*From:* Acee Lindem (acee) 
*Date:* 2016-08-25 03:04
*To:* lizhenqi...@chinamobile.com
; Dongjie (Jimmy)
; Les Ginsberg (ginsberg)
; ospf@ietf.org 
*CC:* Zhangxudong (zhangxudong, VRP) 
*Subject:* Re: [OSPF] Solicit feedbacks on
draft-dong-ospf-maxage-flush-problem-statement
Speaking as WG member:

Hi Zhenjiang,

I don’t doubt that this was a very disquieting experience.
However, I still don’t think we should attempt to change the
protocol to compensate for routers that do not adhere to the
protocol. To make an analogy, in my years of OSPF experience
I’ve been subject to a number of bugs related to OSPF’s usage of
local wire multicast (some triggered by obscure conditions such
as routing and bridging on the same port). However, I’ve never
proposed to not use local wire multicast. Also, after 25 years
of OSPFv2, it doesn’t make sense to try and change the protocol
to avoid bugs in this area. As for identifying the nefarious
router, I think adding a counter and possibly a separate
notification to the YANG model might be warranted since purging
a non-self-originated LSA should not be a common occurrence in
most networks.

Thanks,
Acee
P.S. Since this is an OSPF standards list, I’ve purposely
avoided the questions as to how this catastrophic bug made it
into a production network.


From: "lizhenqi...@chinamobile.com
"
mailto:lizhenqi...@chinamobile.com>>
Date: Wednesday, August 24, 2016 at 2:11 PM
To: Jie Dong mailto:jie.d...@huawei.com>>,
Acee Lindem mailto:a...@cisco.com>>, "Les
Ginsberg (ginsberg)" mailto:ginsb...@cisco.com>>, OSPF WG List mailto:ospf@ietf.org>>
Cc: "Zhangxudong (zhangxudong, VRP)" mailto:zhangxud...@huawei.com>>
Subject: Re: RE: [OSPF] Solicit feedbacks on
draft-dong-ospf-maxage-flush-problem-statement

Hello Jie, Acee and Les,

I am a coauthor of this draft from operator China Mobile.
Thank you all for your discussion and suggestion in the
previous mails. As you all discussed, a misbehavior OSPF
router (due to software or hardware problem) can cause
severe problem in the whole OSPF domain.

Here I want to point out that OSPF route flapping DID occour
in my field network contributed by a misbehavior OSPF router
installed. The procedure to analyze and look for the cause
were very complicated because we did not know the source of
the flushing. Two hours past, we could not identify the real
cause and restore our network. The CPU utilization of OSPF
routers was high, the network traffic decreased
significantly, lots of tunnel down warnings raised. When we
tried to shutdown one OSPF router, route flapping stopped.
This router was a newly deployed one. Through communication
with our vendor, they admitted that this pro