The use of UDP vs TCP is use-case specific. For example, are you logging and
don't care if you miss messages or are you maintaining RIB states for
applications like SDN?
In terms of accurate logging (ordered regardless of timestamp) and maintaining
state… TCP is required otherwise we introduce out-of-order and loss recovery
complexities. BGP PDU order is required in order to track changes and to
maintain accurate RIB states. While a SEQ number in BMP can help to
re-sequence messages, that puts a lot on every BMP receiver/client. For
example, BMP receivers will now have to buffer messages and re-sequence them to
ensure proper ordering when processing. If buffers are exceeded, what happens
to those messages and how would the BMP receiver request those missing
messages/pdus? Regardless of how, this adds complexity to both the sender and
receiver. IMO, this is not addressing the problem of RIB dumps or picking up
where you left off on reconnect.
The draft suggests to use TCP_FAST_OPEN, which I believe adds more complexity
while not solving the other challenges relating to RIB dumps/refreshes. It
doesn't address PEER UP handling on reconnect, how to handle peers that change
or are new during the no-connection time, and on how to request a peer refresh
when needed. It also doesn't address the buffer exhaustion problem on the
sender (router) side. IMO, the sender should be configured using buffer sizes
per receiver and not based on time. The amount of time is relative to the
number of updates. For example, a refresh to update policies will
flood/exhaust buffers quickly in seconds while normal updates may last for
minutes without buffer exhaustion.
There are at least three problems with RIB dumps/reconnects to be solved:
1) Transient reconnects due to network failures, restarts of receivers, etc.
are resulting in unnecessary INITs, RIB dumps, and PEER UPs. A PEER UP
normally means that the receiver invalidates all previous RIB entries as it
does not know if things were changed/removed during the gap (from last PEER
UP/DOWN) period. A RIB dump is expected to refresh the peer RIB upon a PEER
UP. IMO, what we need is application level control so the BMP receiver can
send a control message to the sender to indicate what's needed per-peer. For
example, a receiver restart (new connection) may require a full refresh of PEER
UPs and RIB dumps, partial refresh on a subset of peers, only new peers since
last reconnect time, and/or no refresh at all for any of the given peers.
Unless a refresh/RIB dump is requested or needed, messages should continue
where they left off based on buffer allocations (e.g., offsets or similar).
IMO, the fast reconnect does not address the set of peers, especially
considering the peers that changed during the no-connect period.
2) Regardless of quick restarts or transient network problems, sometimes a RIB
dump and PEER UP is not needed. It would be nice to pick up where we left off,
if that is possible. This should be per-peer instead of being binary at the
session level due chatty peers causing an issue with buffering. This can
include use-cases for logging, where the logging process does not actually care
for a RIB dump at all. Instead, it only wants new messages starting at BMP
receiver connect time (based on peer and afi/safi).
3) BMP receivers (like routers when needing to reapply a policy) sometimes need
to get a refresh for a subset of peers. For example, a DB change that results
in some peers needing to be added again. Currently, the method to refresh
are:
a) go to the router and initiate a route-refresh, which is intrusive and
requires auth to do this. Not great.
b) reset the BMP TCP connection to trigger the router to refresh everything.
This is not ideal as there can be hundreds of peers and only a small set
needed to be refreshed.
IMO, the same solution can be used to solve for all of the above. I would
like to see a new BMP message that the receiver sends on initial connect to
indicate what's needed. It's important to call out that not all peers (by
afi/safi) are equal in terms of buffer exhaustion during connection loss to a
receiver. For example, link-state, public/private peering with customers,
etc… do not have many updates over several minutes. An all-or-nothing approach
based on short-time is not a desired solution (IMO) and leaves many other RIB
dump use-cases unaddressed.
Thanks,
Tim
On 3/11/21, 8:45 PM, "GROW" wrote:
Hi Jakob,
➢ When processes abort unexpectedly, loss must be assumed unless data integrity
can be specifically proven.
Absolutely. We need to distinguish between application and transport. At
transport we do have sequence numbers and integrity on transport is ensured. On
BMP application it is not. Here we need to distinguish between BMP application
and BMP session. In a previous message to you I wrote:
➢ What I