Re: BIRD drops specific IPv6 session for no reason

2020-02-28 Thread Alexander Zubkov
Hi,

Can it be some IO issue? We had similar problems with bird making an
IO loop for too much time so that hold timers were expired by that
time. It was probably caused when it was writing a log file on a busy
HDD. But we catch those with syslog too, because that write is
blocking for the bird too.
But nevertheless the OS should have been replying something in the TCP
session in your case - accepting the segments or showing that the
window is full. As far as I know bird does not have its own TCP stack,
so the OS is to be blamed for that part. It can be stuck for some
reason/bug or as other people suggested it could be sending packets
somewhere else or not knowing where to send them.

On Fri, Feb 28, 2020 at 4:46 PM Ondrej Zajicek  wrote:
>
> On Fri, Feb 28, 2020 at 03:33:06PM +0100, Stavros Konstantaras wrote:
> > HI Alarig,
> >
> > Thank you for sharing your experiences. I don’t have the MSS currently but 
> > if that was the case, wouldn’t have experienced the drops more frequently?
> > Currently it happens once per month (or 0.8 per month) and contrary to your 
> > case which was 100% network related, in our case we don’t even see the
> > reply packet being generated and leaving the box.
> >
> > What puzzles me also and based on the capture, is that I don’t see the 
> > TCP-ACK messages being sent to the customer. If BIRD opens a TCP socket
> > (not a simple RAW socket), I assume that the TCP connection will be handled 
> > by the OS and BIRD will push data segments (BGP keep alive messages) when 
> > ready.
> >
> > But as per output, I don’t see the TCP ack messages at all. Is BIRD 
> > handling the TCP communication as well?
>
> Hi
>
> That is a good point. BIRD uses regular TCP socket, so if you do not see
> TCP ack, then it is likely an underlying (kernel) issue. There were some
> reports of IPv6 issues in recent kernels [*]
>
> Also, the log message:
>
> Feb 20 21:46:11 rs1-mng bird6: 2001:7F8:1::A500:19:7727:1: Received: Hold 
> timer expired
>
> shows that the notification message was received by the BIRD. The packet
> dump shows that keepalives were not sent by BIRD side. You could enable
> 'debug all' for given peer to see if BIRD tries to send keepalives. You
> could also monitor state of socket using 'ss' tool.
>
> [*] https://bird.network.cz/pipermail/bird-users/2020-February/014270.html
>
> --
> Elen sila lumenn' omentielvo
>
> Ondrej 'Santiago' Zajicek (email: santi...@crfreenet.org)
> OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net)
> "To err is human -- to blame it on a computer is even more so."
>



Re: BGP repropagation practices on not well known attributes

2020-02-28 Thread Mattia Milani

First, handling of optional attributes depends on whether you knows them,
not on whether a peer knows them.

Unknown optional transitive attributes are re-propagated, but with set
'partial' flag. For details, see bgp_export_attr() function.


Thanks for the answer I really appreciate it.

Yeah form the RFC I get that unknown optional transitive attributes are 
re-propagated.
I was more interest in acceptance part, the RFC 4271, section 5, 5th 
paragraph says:


"... Paths with unrecognized transitive optional attributes SHOULD be 
accepted. ..."

This SHOULD was confusing me.

This take me to an other question,
Given that I already know the destination d (that i learnt with an 
unknown optional transit attribute that I re-propagated with the 
'partial' flag).
At some point I receive an UPDATE for the destination d, and the only 
attribute that is different is the unknown optional transit one.


At this point a bird node, in first place can recognize that only the 
unknown optional transit attribute has been changed? and in addiction, 
if it recognize it, the UPDATE will be re-propagated with the optional 
transit attribute and the flag 'partial', am I right?


Thanks for the help,
Mattia



Re: Force gateway recursive lookup in iBGP routes

2020-02-28 Thread Ondrej Zajicek
On Fri, Feb 28, 2020 at 02:01:40PM +0100, Miroslav Kalina wrote:
> Hello there,
> 
> I am currently trying to use BIRD for route propagation from our
> baremetal Kubernetes clusters (Calico CNI, iBGP sessions within AS65100)
> into infrastructure via eBGP (private AS) and it works well.
> 
> The issue I have is when I want also to create BGP peering between BIRD
> and (MetalLB) service inside Kubernetes cluster (multihop, no NAT
> involved, session established OK) and I receive routes /32 with
> BGP.next_hop to IP within Kubernetes cluster (=not directly connected).
> These routes are marked as "unreachable" even if I explicitly set
> "gateway recursive".

Hello

The issue here is that BIRD does not support resolution of recursive
gateway through another route with recursive next hop. Recursive route
10.96.255.33/32 uses next hop 10.96.20.25, which is resolved through
10.96.20.0/26, which itself has a recursive next hop.

Perhaps you could modify okubedev1m1 / 10.96.20.0/26 to have direct next hop.

> Produces following routing table:
> 
> bird> show route all
> Table master4:
> 10.96.255.33/32  unreachable [okubedev1_lb1 10:32:14.973 from 
> 10.96.20.25] * (100) [?]
> Type: BGP univ
> BGP.origin: Incomplete
> BGP.as_path: 
> BGP.next_hop: 10.96.20.25
> BGP.local_pref: 0
> 10.96.20.0/26unicast [okubedev1m1 11:15:45.704] * (100) [i]
> via 10.30.21.19 on enp0s4
> Type: BGP univ
> BGP.origin: IGP
> BGP.as_path: 
> BGP.next_hop: 10.30.21.19
> BGP.local_pref: 100
> 10.30.20.0/22unicast [direct1 13:52:46.301] * (240)
> dev enp0s4
> Type: device univ


-- 
Elen sila lumenn' omentielvo

Ondrej 'Santiago' Zajicek (email: santi...@crfreenet.org)
OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net)
"To err is human -- to blame it on a computer is even more so."


Re: BIRD drops specific IPv6 session for no reason

2020-02-28 Thread Ondrej Zajicek
On Fri, Feb 28, 2020 at 03:33:06PM +0100, Stavros Konstantaras wrote:
> HI Alarig, 
> 
> Thank you for sharing your experiences. I don’t have the MSS currently but if 
> that was the case, wouldn’t have experienced the drops more frequently?
> Currently it happens once per month (or 0.8 per month) and contrary to your 
> case which was 100% network related, in our case we don’t even see the
> reply packet being generated and leaving the box. 
> 
> What puzzles me also and based on the capture, is that I don’t see the 
> TCP-ACK messages being sent to the customer. If BIRD opens a TCP socket 
> (not a simple RAW socket), I assume that the TCP connection will be handled 
> by the OS and BIRD will push data segments (BGP keep alive messages) when 
> ready.
> 
> But as per output, I don’t see the TCP ack messages at all. Is BIRD handling 
> the TCP communication as well? 

Hi

That is a good point. BIRD uses regular TCP socket, so if you do not see
TCP ack, then it is likely an underlying (kernel) issue. There were some
reports of IPv6 issues in recent kernels [*]

Also, the log message:

Feb 20 21:46:11 rs1-mng bird6: 2001:7F8:1::A500:19:7727:1: Received: Hold timer 
expired 

shows that the notification message was received by the BIRD. The packet
dump shows that keepalives were not sent by BIRD side. You could enable
'debug all' for given peer to see if BIRD tries to send keepalives. You
could also monitor state of socket using 'ss' tool.

[*] https://bird.network.cz/pipermail/bird-users/2020-February/014270.html

-- 
Elen sila lumenn' omentielvo

Ondrej 'Santiago' Zajicek (email: santi...@crfreenet.org)
OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net)
"To err is human -- to blame it on a computer is even more so."



Re: BIRD drops specific IPv6 session for no reason

2020-02-28 Thread Stavros Konstantaras
Hi Tapio,

Good point as well but I don’t have access to customer’s router. I can only 
touch my Linux server and based on that, ARP entry is there as the BGPv4 
session remains up (which means that the switches in the middle can have a 
valid MAC entry in their MAC table). 

Only the BGPv6 session drops and when it drops, the log output does not really 
help:

Feb 20 21:46:11 rs1-mng bird6: 2001:7F8:1::A500:19:7727:1: Received: Hold timer 
expired
Feb 20 21:46:11 rs1-mng bird6: 2001:7F8:1::A500:19:7727:1: BGP session closed
Feb 20 21:46:11 rs1-mng bird6: 2001:7F8:1::A500:19:7727:1: State changed to stop
Feb 20 21:46:11 rs1-mng bird6: 2001:7F8:1::A500:19:7727:1: Down
Feb 20 21:46:11 rs1-mng bird6: 2001:7F8:1::A500:19:7727:1: State changed to down



Best regards,

Stavros Konstantaras | Sr. Network Engineer | AMS-IX 
M +31 (0) 620 89 51 04 | T +31 20 305 8999
ams-ix.net

> On 28 Feb 2020, at 14:08, Tapio Haapala  wrote:
> 
> double check that your router have arp entry and route for that peer when 
> that happens. Example if your router get wrong route for peer it can send 
> response packets (or some cases arp requests) to wrong interface. So dump 
> your another interfaces also at same time and you will see what it do. 
> Probably watch for route and arp with proper grep and -n is also your friend 
> if that happens very often. 
> 
> On 28/02/2020 13.41, Stavros Konstantaras wrote:
>> Hi Bird community,
>> 
>> We are investigating a weird customer issue regarding our Bird Route Servers 
>> (version 1.6.3) and a specific IPv6 session. Customer reports a sudden drop 
>> of his IPv6 session and -until now- we could not relate those drops with any 
>> issue or instability. Everything seems normal and no other customer 
>> complained at the moment of the incident. 
>> 
>> 
>> 
>> After some packet capturing at the moment of the event, we discovered that 
>> BIRD does not send a response messages to the customer’s BGP keepalive 
>> messages (see attached picture), which result to the BGP hold timer to 
>> expire and the sessions to be dropped. We observed this anomaly with both 
>> RSs but at different time slots and the tcpdump capture was running at the 
>> Interface were Bird is sending all BGP traffic for customers. At the moment 
>> of the event, we didn’t do any maintenance or other RS related work.
>> 
>> Has any of you experienced this in the past? If yes, how did you solve this?
>> Any related feedback is welcomed. 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> Best regards,
>> 
>> Stavros Konstantaras | Sr. Network Engineer | AMS-IX 
>> M +31 (0) 620 89 51 04 | T +31 20 305 8999
>> ams-ix.net  >
>> 
> 
> 
> -- 
> F-Solutions Oy
> 
> Tapio Haapala
> 
> PL7, 90571 Oulu
> GSM   +358400998371
> Skype burner-
> IRC   Burner@ircnet



Re: BIRD drops specific IPv6 session for no reason

2020-02-28 Thread Stavros Konstantaras
HI Alarig, 

Thank you for sharing your experiences. I don’t have the MSS currently but if 
that was the case, wouldn’t have experienced the drops more frequently?
Currently it happens once per month (or 0.8 per month) and contrary to your 
case which was 100% network related, in our case we don’t even see the
reply packet being generated and leaving the box. 

What puzzles me also and based on the capture, is that I don’t see the TCP-ACK 
messages being sent to the customer. If BIRD opens a TCP socket 
(not a simple RAW socket), I assume that the TCP connection will be handled by 
the OS and BIRD will push data segments (BGP keep alive messages) when ready.

But as per output, I don’t see the TCP ack messages at all. Is BIRD handling 
the TCP communication as well? 


But good point the MSS, I will try to check it as well in the next incident. 
Thanks 


Best regards,

Stavros Konstantaras | Sr. Network Engineer | AMS-IX 
M +31 (0) 620 89 51 04 | T +31 20 305 8999
ams-ix.net




> On 28 Feb 2020, at 14:12, Alarig Le Lay  wrote:
> 
> Hi Stavros,
> 
> On ven. 28 févr. 12:41:24 2020, Stavros Konstantaras wrote:
>> Hi Bird community,
>> 
>> We are investigating a weird customer issue regarding our Bird Route
>> Servers (version 1.6.3) and a specific IPv6 session. Customer reports
>> a sudden drop of his IPv6 session and -until now- we could not relate
>> those drops with any issue or instability. Everything seems normal and
>> no other customer complained at the moment of the incident. 
>> 
>> 
>> 
>> After some packet capturing at the moment of the event, we discovered
>> that BIRD does not send a response messages to the customer’s BGP
>> keepalive messages (see attached picture), which result to the BGP
>> hold timer to expire and the sessions to be dropped. We observed this
>> anomaly with both RSs but at different time slots and the tcpdump
>> capture was running at the Interface were Bird is sending all BGP
>> traffic for customers. At the moment of the event, we didn’t do any
>> maintenance or other RS related work.
>> 
>> Has any of you experienced this in the past? If yes, how did you solve
>> this?
>> Any related feedback is welcomed. 
> 
> Do you have the MSS used to establish the session? I had an issue about
> a session flapping with edgecast (verizonmedia) flapping on AMS-IX
> because both were having a MTU at 9216 on our port. But some switch
> didn’t like it well and sometime a packet is loss. If it’s the one
> containing the keepalive, the session goes down.
> 
> I resolved it by setting a MTU of 1514 on my side (which should have
> been since always).
> 
> Also, note that I’m not directly connected to the IXP, I’m using a
> reseller.
> 
> Regards,
> -- 
> Alarig



Re: BIRD drops specific IPv6 session for no reason

2020-02-28 Thread Alarig Le Lay
Hi Stavros,

On ven. 28 févr. 12:41:24 2020, Stavros Konstantaras wrote:
> Hi Bird community,
> 
> We are investigating a weird customer issue regarding our Bird Route
> Servers (version 1.6.3) and a specific IPv6 session. Customer reports
> a sudden drop of his IPv6 session and -until now- we could not relate
> those drops with any issue or instability. Everything seems normal and
> no other customer complained at the moment of the incident. 
> 
> 
> 
> After some packet capturing at the moment of the event, we discovered
> that BIRD does not send a response messages to the customer’s BGP
> keepalive messages (see attached picture), which result to the BGP
> hold timer to expire and the sessions to be dropped. We observed this
> anomaly with both RSs but at different time slots and the tcpdump
> capture was running at the Interface were Bird is sending all BGP
> traffic for customers. At the moment of the event, we didn’t do any
> maintenance or other RS related work.
> 
> Has any of you experienced this in the past? If yes, how did you solve
> this?
> Any related feedback is welcomed. 

Do you have the MSS used to establish the session? I had an issue about
a session flapping with edgecast (verizonmedia) flapping on AMS-IX
because both were having a MTU at 9216 on our port. But some switch
didn’t like it well and sometime a packet is loss. If it’s the one
containing the keepalive, the session goes down.

I resolved it by setting a MTU of 1514 on my side (which should have
been since always).

Also, note that I’m not directly connected to the IXP, I’m using a
reseller.

Regards,
-- 
Alarig


Re: BIRD drops specific IPv6 session for no reason

2020-02-28 Thread Tapio Haapala
double check that your router have arp entry and route for that peer when that 
happens. Example if your router get wrong route for peer it can send response 
packets (or some cases arp requests) to wrong interface. So dump your another 
interfaces also at same time and you will see what it do. Probably watch for 
route and arp with proper grep and -n is also your friend if that happens very 
often. 

On 28/02/2020 13.41, Stavros Konstantaras wrote:
> Hi Bird community,
> 
> We are investigating a weird customer issue regarding our Bird Route Servers 
> (version 1.6.3) and a specific IPv6 session. Customer reports a sudden drop 
> of his IPv6 session and -until now- we could not relate those drops with any 
> issue or instability. Everything seems normal and no other customer 
> complained at the moment of the incident. 
> 
> 
> 
> After some packet capturing at the moment of the event, we discovered that 
> BIRD does not send a response messages to the customer’s BGP keepalive 
> messages (see attached picture), which result to the BGP hold timer to expire 
> and the sessions to be dropped. We observed this anomaly with both RSs but at 
> different time slots and the tcpdump capture was running at the Interface 
> were Bird is sending all BGP traffic for customers. At the moment of the 
> event, we didn’t do any maintenance or other RS related work.
> 
> Has any of you experienced this in the past? If yes, how did you solve this?
> Any related feedback is welcomed. 
> 
> 
> 
> 
> 
> 
> 
> Best regards,
> 
> Stavros Konstantaras | Sr. Network Engineer | AMS-IX 
> M +31 (0) 620 89 51 04 | T +31 20 305 8999
> ams-ix.net 
> 


-- 
F-Solutions Oy

Tapio Haapala

PL7, 90571 Oulu
GSM   +358400998371
Skype burner-
IRC   Burner@ircnet


Force gateway recursive lookup in iBGP routes

2020-02-28 Thread Miroslav Kalina
Hello there,

I am currently trying to use BIRD for route propagation from our
baremetal Kubernetes clusters (Calico CNI, iBGP sessions within AS65100)
into infrastructure via eBGP (private AS) and it works well.

The issue I have is when I want also to create BGP peering between BIRD
and (MetalLB) service inside Kubernetes cluster (multihop, no NAT
involved, session established OK) and I receive routes /32 with
BGP.next_hop to IP within Kubernetes cluster (=not directly connected).
These routes are marked as "unreachable" even if I explicitly set
"gateway recursive".

I know this recursive gateway lookup works well for routes learned from
eBGP, but I can't make Calico peers external because it won't work in my
setup. Calico nodes has iBGP full mesh and I would receive all routes
from every single node and I wouldn't be able to distinguish which lives
where.

Unfortunately BGP peer inside cluster has no support to modify next_hop
and always sends self, so I am looking for workaround. Also I cannot set
specific "gw" in import filter, because I have these multihop peers
configured with bgp neighbor range subnet (there will be multiple of
them, I don't know exact IP addresses in advance).

Configuration snippet like this:

# calico cluster peers
filter bgp_in_okubedev1_calico {
if net ~ [ 10.96.16.0/20+ ] then accept;
reject;
}

protocol bgp okubedev1m1 {
local 10.30.20.180 as 65100;
neighbor 10.30.21.19 as 65100;

passive yes;

ipv4 {
import filter bgp_in_okubedev1_calico;
export none;
};
}

# metallb multihop peers
filter bgp_in_okubedev1_metallb {
# gw is recursively looked up localy and passed into BGP.next_hop
#bgp_next_hop = gw;

if net ~ [ 10.96.255.32/28+ ] then accept;
reject;
}

protocol bgp okubedev1_lb_tpl {
local 10.30.20.180 as 65100;
neighbor range 10.96.16.0/20 as 65100;

passive yes;
multihop;

ipv4 {
gateway recursive;

import filter bgp_in_okubedev1_metallb;
export none;
};
}


Produces following routing table:

bird> show route all
Table master4:
10.96.255.33/32  unreachable [okubedev1_lb1 10:32:14.973 from 10.96.20.25] 
* (100) [?]
Type: BGP univ
BGP.origin: Incomplete
BGP.as_path: 
BGP.next_hop: 10.96.20.25
BGP.local_pref: 0
10.96.20.0/26unicast [okubedev1m1 11:15:45.704] * (100) [i]
via 10.30.21.19 on enp0s4
Type: BGP univ
BGP.origin: IGP
BGP.as_path: 
BGP.next_hop: 10.30.21.19
BGP.local_pref: 100
10.30.20.0/22unicast [direct1 13:52:46.301] * (240)
dev enp0s4
Type: device univ


I am almost sure I am missing some key BIRD or BGP feature, which I need
to know to understand this behavior properly.

Any comment or suggestion would be appreciated.

Best regards

-- 
Miroslav Kalina
Systems developement specialist

miroslav.kal...@livesport.eu
+420 773 071 848

Livesport s.r.o.
Aspira Business Centre
Bucharova 2928/14a, 158 00 Praha 5
www.livesport.eu