On Sun, Jun 21, 2015 at 11:39:30AM +0100, Paul Jakma wrote: > Hi Ondrej, > > On Thu, 18 Jun 2015, Ondrej Zajicek wrote: > > >Hello > > > >We noticed an incompatibility between Quagga and BIRD in > >OSPF loading phase: > > > >In short: > > > >1) BIRD sends LSREQ packet with ~60 LSR. > > > >2) Quagga answers with one LSUPD packet with ~20 LSA. > > > >3) BIRD waits for remaining LSUPD packets (to cover all requested LSAs) > >before sending new LSREQ. > > > >4) Usually no more LSUPD is sent by Quagga, therefore BIRD timeouts > >(RxmtInterval) before sending next LSREQ packet. This leads to > >significantly longer loading phase. > > That doesn't sound good. Do you still have the pcap?
I don't have pcap, i have log from the BIRD side (from Andrew <[email protected]>), describing sent and received packets and neighbor events. > >We noticed this problem with Quagga 0.99.22 and BIRD 1.5.0 > > > >My interpretation of RFC 2328 10.9 is that when LSREQ is received, it > >should be immediately answered by multiple LSUPD packets to cover all LSRs > >from LSREQ. And when LSREQ is sent, it should be completely answered > >before sending the next one. But i would say that this aspect of RFC 2328 > >is especially vague and i see other alternative interpretations. What is > >the opinion of Quagga developers w.r.t. this issue? > > The code concerned in Quagga has not changed in a long long time I think. We > did implement Ogier's optimisation to the DD Exchange process, but that > should be compatible with normal OSPF, and shouldn't affect the LSA > Req/Update process thereafter regardless I think. > > Quagga builds a list of all the LSAs to send when parsing the LSA Req. Then, > in a seperate green-thread that is scheduled without delay after that, it > should turn that list into a series of LSA Updates packets which are > scheduled to be written out in another green-thread. Glad to see that at least everybody agrees on interpretation of RFC 2328 10.9 . Are there some kind of artificial delays in the last thread (like to maintaining max packet rate per) between such LSA Update transmissions? > Quagga shouldn't delay sending those. > > Possibilities: > > * One of the LSAs sent by BIRD was not in Quagga's LSDB, so Quagga raised > a BadLSReq event, and should have returned to the ExStart neighbour-state. > You should see DD negotiation packets from Quagga then. That did not happen, as that would reset neighbor state even on the other side, which did not happen: 2015-06-16 22:54:55 <TRACE> ospf_ua: Neighbor 10.255.192.102 on vlan1 changed state from Down to Init 2015-06-16 22:54:55 <TRACE> ospf_ua: Neighbor 10.255.192.102 on vlan1 changed state from Init to 2-Way 2015-06-16 22:54:55 <TRACE> ospf_ua: Neighbor 10.255.192.102 on vlan1 changed state from 2-Way to ExStart 2015-06-16 22:54:55 <TRACE> ospf_ua: Neighbor 10.255.192.102 on vlan1 changed state from ExStart to Exchange 2015-06-16 22:54:55 <TRACE> ospf_ua: Neighbor 10.255.192.102 on vlan1 changed state from Exchange to Loading 2015-06-16 22:56:40 <TRACE> ospf_ua: Neighbor 10.255.192.102 on vlan1 changed state from Loading to Full > * One of the LSAs was too big to be sent in one MTU packet, Quagga tried > to send it in a fragmented LSA Update packet, but that didn't work and > got dropped somewhere under Quagga. Though, you should still have seen > the rest of the LSA Updates then - unless all remaining were very large > LSAs. > > Was it a network where routers had /lots/ of links? > > Unlikely... > > * Similar, all the remaining LSAs were too large to fit in an IP+OSPF > packet. We drop the packet and warn the admin. Unlikely... This is very unlikely, as all Type 1 LSAs from the first LSREQ packet were sent in the first LSUPD packet (together with others) and remaining LSAs were Type 3 and Type 4. > * Some kind of unfairness issue somewhere meant Quagga never got around to > either the "build the LSA Update response packets" green-thread, or else > the "write out the packet queue for that neighbour" green-thread (we do > prioritise some packets in the write-out-queue, e.g. Hellos - so > there's potential unfairness there that could affect things under high > loads). Unlikely, as the first LSA Update packets triggered by LS Request packets were sent immediately. > Or something else.... > > I don't see anything obvious unfortunately. > > Do you have more details on the network used? Do you remember if it was > reproducible? Well, i did not experience the issue, i am just BIRD developer trying to resolve the bug in BIRD/Quagga loading reported on BIRD mailing list. Here is the commented relevant part of the log (sent separately): 2015-06-16 22:54:55 <TRACE> ospf_ua: HELLO packet received from nbr 10.255.192.102 on vlan1 2015-06-16 22:54:55 <TRACE> ospf_ua: New neighbor 10.255.192.102 on vlan1, IP address 192.168.255.93 2015-06-16 22:54:55 <TRACE> ospf_ua: Neighbor 10.255.192.102 on vlan1 changed state from Down to Init 2015-06-16 22:54:55 <TRACE> ospf_ua: Neighbor 10.255.192.102 on vlan1 changed state from Init to 2-Way 2015-06-16 22:54:55 <TRACE> ospf_ua: Neighbor 10.255.192.102 on vlan1 changed state from 2-Way to ExStart 2015-06-16 22:54:55 <TRACE> ospf_ua: DBDES packet sent to nbr 10.255.192.102 on vlan1 2015-06-16 22:54:55 <TRACE> ospf_ua: DBDES packet received from nbr 10.255.192.102 on vlan1 2015-06-16 22:54:55 <TRACE> ospf_ua: Neighbor 10.255.192.102 on vlan1 changed state from ExStart to Exchange ... the concerned exchange started 2015-06-16 22:54:55 <TRACE> ospf_ua: DBDES packet sent to nbr 10.255.192.102 on vlan1 2015-06-16 22:54:55 <TRACE> ospf_ua: DBDES packet received from nbr 10.255.192.102 on vlan1 2015-06-16 22:54:55 <TRACE> ospf_ua: DBDES packet sent to nbr 10.255.192.102 on vlan1 2015-06-16 22:54:55 <TRACE> ospf_ua: LSREQ packet sent to nbr 10.255.192.102 on vlan1 ... initial LSREQ sent 2015-06-16 22:54:55 <TRACE> ospf_ua: DBDES packet received from nbr 10.255.192.102 on vlan1 2015-06-16 22:54:55 <TRACE> ospf_ua: DBDES packet sent to nbr 10.255.192.102 on vlan1 2015-06-16 22:54:55 <TRACE> ospf_ua: LSUPD packet received from nbr 10.255.192.102 on vlan1 ... first part of answer to the initial LSREQ containing proper LSAs 2015-06-16 22:54:55 <TRACE> ospf_ua: LSUPD packet flooded via vlan1 ... propagation of upgraded self-originated LSA triggered by the old one received in the LSUPD above 2015-06-16 22:54:55 <TRACE> ospf_ua: DBDES packet received from nbr 10.255.192.102 on vlan1 2015-06-16 22:54:55 <TRACE> ospf_ua: DBDES packet sent to nbr 10.255.192.102 on vlan1 2015-06-16 22:54:55 <TRACE> ospf_ua: DBDES packet received from nbr 10.255.192.102 on vlan1 2015-06-16 22:54:55 <TRACE> ospf_ua: DBDES packet sent to nbr 10.255.192.102 on vlan1 ... long sequence of DBDES sent/received exchange .., I would already expect remaining LSUPD packets mixed-in here 2015-06-16 22:54:55 <TRACE> ospf_ua: DBDES packet received from nbr 10.255.192.102 on vlan1 2015-06-16 22:54:55 <TRACE> ospf_ua: DBDES packet sent to nbr 10.255.192.102 on vlan1 2015-06-16 22:54:55 <TRACE> ospf_ua: DBDES packet received from nbr 10.255.192.102 on vlan1 2015-06-16 22:54:55 <TRACE> ospf_ua: DBDES packet sent to nbr 10.255.192.102 on vlan1 2015-06-16 22:54:55 <TRACE> ospf_ua: Neighbor 10.255.192.102 on vlan1 changed state from Exchange to Loading ... here the DBDES exchange is completed 2015-06-16 22:54:55 <TRACE> ospf_ua: LSUPD packet received from nbr 10.255.192.102 on vlan1 2015-06-16 22:54:55 <TRACE> ospf_ua: LSUPD packet received from nbr 10.255.192.102 on vlan1 ... now we received two more LSUPD packets [*], but these LSUPDs contain only some freshly originated Type 5 LSAs not required by the initial LSREQ. This is probably triggered by DBES exchange completion. Not sure if this may be the cause of problem, but probably no - they should be already sent 2015-06-16 22:54:55 <TRACE> ospf_ua: Non-HELLO packet received from unknown nbr 10.255.192.101 on vlan1, src 192.168.255.66 2015-06-16 22:54:56 <TRACE> ospf_ua: HELLO packet received from nbr xxx.xxx.132.13 on vlan1 2015-06-16 22:54:56 <TRACE> ospf_ua: HELLO packet received from nbr 10.255.192.101 on vlan1 2015-06-16 22:54:57 <TRACE> ospf_ua: DBDES packet sent to nbr 10.255.192.101 on vlan1 ... here starts some other initial neighbor exchange, but there is already 1-2 sec interval where nbr 10.255.192.102 sent nothing while it should send remaining LSUPDs 2015-06-16 22:54:57 <TRACE> ospf_ua: LSACK packet received from nbr 10.255.192.102 on vlan1 2015-06-16 22:54:59 <TRACE> ospf_ua: LSACK packet received from nbr 10.255.192.102 on vlan1 ... here are acknowledgemens for LSAs from the 2015-06-16 22:55:00 <TRACE> ospf_ua: LSUPD packet sent to nbr 10.255.192.102 on vlan1 2015-06-16 22:55:00 <TRACE> ospf_ua: LSUPD packet sent to nbr 10.255.192.102 on vlan1 2015-06-16 22:55:00 <TRACE> ospf_ua: LSUPD packet sent to nbr 10.255.192.102 on vlan1 2015-06-16 22:55:00 <TRACE> ospf_ua: LSREQ packet sent to nbr 10.255.192.102 on vlan1 ... another LSREQ sent after RxmtInterval -- Elen sila lumenn' omentielvo Ondrej 'Santiago' Zajicek (email: [email protected]) OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net) "To err is human -- to blame it on a computer is even more so."
signature.asc
Description: Digital signature
_______________________________________________ Quagga-dev mailing list [email protected] https://lists.quagga.net/mailman/listinfo/quagga-dev
