Re: bgpd VPNs broken in kroute with 7.2?

2022-11-04 Thread Bars Bars
Nice additive to make thing works!



пт, 4 нояб. 2022 г., 14:29 Claudio Jeker :

> On Fri, Nov 04, 2022 at 10:18:26AM +0300, Bars Bars wrote:
> > Hi, Claudio!
> >
> > It seems there were at least two issues:
> > 1. VPN routes were never installed to fib (with errno 'Network is
> > unreachable'
> > returned when send_rtmsg tried to writev them)
> > 2. kroute_remove brakes when prefix withdraw comes from rde (with 'Not
> > handled AID')
> >
> > I applied your patch and vpn routes now get installed to the fib!
> > But kroute_remove cannot handle vpn prefixes withdrawal still.
> > I manually triggered prefix withdraw on the other side of bgp session and
> > hooked the prefix at kroute_remove just before it returned -1.
> > "
> > kroute_remove: rd 65001:100 10.42.200.9/32 NH ???
> > kroute_remove: not handled AID
> > "
> > So I extend the patch abit and the issue 2 seems to go:
> > (Not sure that I did it right. Also don't know if kf->nexthop = '???' is
> ok
> > in kroute_remove during withdrawal, but fib reflects correctly.)
> >
>
> It seems that was only part of the fix for withdraws. There is another bug
> in the nlri parser where the code fails to properly jump over the implicit
> label.
>
> This diff seems to work for me. The util.c changes fix the problem when
> parsing the MP_UNREACH_NLRI attribute.
>
> I tested the v4 case but for v6 another can of worms showed up. It is not
> posible to configure IPv6 addrs on mpe(4) which makes it impossible to
> inject IPv6 VPN routes into the rdomain :(
> --
> :wq Claudio
>
> Index: kroute.c
> ===
> RCS file: /cvs/src/usr.sbin/bgpd/kroute.c,v
> retrieving revision 1.301
> diff -u -p -r1.301 kroute.c
> --- kroute.c18 Oct 2022 09:30:29 -  1.301
> +++ kroute.c4 Nov 2022 10:10:58 -
> @@ -580,6 +580,9 @@ krVPN4_change(struct ktable *kt, struct
> (kf->prefix.labelstack[2] << 8);
> mplslabel = htonl(mplslabel);
>
> +   kf->flags |= F_MPLS;
> +   kf->mplslabel = mplslabel;
> +
> /* for blackhole and reject routes nexthop needs to be 127.0.0.1 */
> if (kf->flags & (F_BLACKHOLE|F_REJECT))
> kf->nexthop.v4.s_addr = htonl(INADDR_LOOPBACK);
> @@ -590,6 +593,7 @@ krVPN4_change(struct ktable *kt, struct
> return (-1);
> } else {
> kr->mplslabel = mplslabel;
> +   kr->flags |= F_MPLS;
> kr->ifindex = kf->ifindex;
> kr->nexthop.s_addr = kf->nexthop.v4.s_addr;
> rtlabel_unref(kr->labelid);
> @@ -632,6 +636,9 @@ krVPN6_change(struct ktable *kt, struct
> (kf->prefix.labelstack[2] << 8);
> mplslabel = htonl(mplslabel);
>
> +   kf->flags |= F_MPLS;
> +   kf->mplslabel = mplslabel;
> +
> /* for blackhole and reject routes nexthop needs to be ::1 */
> if (kf->flags & (F_BLACKHOLE|F_REJECT))
> memcpy(>nexthop.v6, , sizeof(kf->nexthop.v6));
> @@ -642,6 +649,7 @@ krVPN6_change(struct ktable *kt, struct
> return (-1);
> } else {
> kr6->mplslabel = mplslabel;
> +   kr6->flags |= F_MPLS;
> kr6->ifindex = kf->ifindex;
> memcpy(>nexthop, >nexthop.v6, sizeof(struct
> in6_addr));
> kr6->nexthop_scope_id = kf->nexthop.scope_id;
> @@ -1878,9 +1886,11 @@ kroute_remove(struct ktable *kt, struct
>
> switch (kf->prefix.aid) {
> case AID_INET:
> +   case AID_VPN_IPv4:
> multipath = kroute4_remove(kt, kf, any);
> break;
> case AID_INET6:
> +   case AID_VPN_IPv6:
> multipath = kroute6_remove(kt, kf, any);
> break;
> default:
> Index: util.c
> ===
> RCS file: /cvs/src/usr.sbin/bgpd/util.c,v
> retrieving revision 1.71
> diff -u -p -r1.71 util.c
> --- util.c  17 Aug 2022 15:15:26 -  1.71
> +++ util.c  4 Nov 2022 10:08:24 -
> @@ -131,7 +131,9 @@ log_rd(uint64_t rd)
> snprintf(buf, sizeof(buf), "rd %s:%hu", inet_ntoa(addr),
> u16);
> break;
> default:
> -   return ("rd ?");
> +   snprintf(buf, sizeof(buf), "rd #%016llx",
> +   (unsigned long long)rd);
> +   break;
>

Re: bgpd VPNs broken in kroute with 7.2?

2022-11-04 Thread Bars Bars
Hi, Claudio!

It seems there were at least two issues:
1. VPN routes were never installed to fib (with errno 'Network is
unreachable'
returned when send_rtmsg tried to writev them)
2. kroute_remove brakes when prefix withdraw comes from rde (with 'Not
handled AID')

I applied your patch and vpn routes now get installed to the fib!
But kroute_remove cannot handle vpn prefixes withdrawal still.
I manually triggered prefix withdraw on the other side of bgp session and
hooked the prefix at kroute_remove just before it returned -1.
"
kroute_remove: rd 65001:100 10.42.200.9/32 NH ???
kroute_remove: not handled AID
"
So I extend the patch abit and the issue 2 seems to go:
(Not sure that I did it right. Also don't know if kf->nexthop = '???' is ok
in kroute_remove during withdrawal, but fib reflects correctly.)

Index: kroute.c
===
RCS file: /cvs/src/usr.sbin/bgpd/kroute.c,v
retrieving revision 1.300
diff -u -p -r1.300 kroute.c
--- kroute.c21 Sep 2022 21:12:03 -1.300
+++ kroute.c4 Nov 2022 06:51:21 -
@@ -580,6 +580,9 @@ krVPN4_change(struct ktable *kt, struct
 (kf->prefix.labelstack[2] << 8);
 mplslabel = htonl(mplslabel);

+   kf->mplslabel = mplslabel;
+   kf->flags |= F_MPLS;
+
 /* for blackhole and reject routes nexthop needs to be 127.0.0.1 */
 if (kf->flags & (F_BLACKHOLE|F_REJECT))
 kf->nexthop.v4.s_addr = htonl(INADDR_LOOPBACK);
@@ -590,6 +593,7 @@ krVPN4_change(struct ktable *kt, struct
 return (-1);
 } else {
 kr->mplslabel = mplslabel;
+   kr->flags |= F_MPLS;
 kr->ifindex = kf->ifindex;
 kr->nexthop.s_addr = kf->nexthop.v4.s_addr;
 rtlabel_unref(kr->labelid);
@@ -632,6 +636,9 @@ krVPN6_change(struct ktable *kt, struct
 (kf->prefix.labelstack[2] << 8);
 mplslabel = htonl(mplslabel);

+   kf->mplslabel = mplslabel;
+   kf->flags |= F_MPLS;
+
 /* for blackhole and reject routes nexthop needs to be ::1 */
 if (kf->flags & (F_BLACKHOLE|F_REJECT))
 memcpy(>nexthop.v6, , sizeof(kf->nexthop.v6));
@@ -642,6 +649,7 @@ krVPN6_change(struct ktable *kt, struct
 return (-1);
 } else {
 kr6->mplslabel = mplslabel;
+   kr6->flags |= F_MPLS;
 kr6->ifindex = kf->ifindex;
 memcpy(>nexthop, >nexthop.v6, sizeof(struct
in6_addr));
 kr6->nexthop_scope_id = kf->nexthop.scope_id;
@@ -1878,9 +1886,11 @@ kroute_remove(struct ktable *kt, struct

 switch (kf->prefix.aid) {
 case AID_INET:
+   case AID_VPN_IPv4:
 multipath = kroute4_remove(kt, kf, any);
 break;
 case AID_INET6:
+   case AID_VPN_IPv6:
 multipath = kroute6_remove(kt, kf, any);
 break;
 default:


чт, 3 нояб. 2022 г. в 16:43, Claudio Jeker :

> On Mon, Oct 31, 2022 at 09:54:12AM +0300, Bars Bars wrote:
> > Hi!
> >
> > Just upgraded to 7.2 and bgpd began to crash with VPNs, not immediately
> > but in 1 minute after daemon start (probably the issue happens
> > when prefix withdraw received or so, and rde goes to change the fib, not
> > sure).
> > If only using IPv4 sessions and keeping VPN sessions down then it works
> > stable.
> > "
> > kroute_remove: not handled AID
> > peer closed imsg connection
> > SE: Lost connection to parent
> > peer closed imsg connection notification: Cease, administratively down
> > fatal in RTR: Lost connection to parent
> > peer closed imsg connection
> > fatal in RDE: Lost connection to parent
> > "
> > im not sure that is a bug, but there was huge kroute refactoring under
> bgpd
> > source tree since 7.1 and it seems that routes with VPN4/VPN6 AIDs are
> > now handled very differently. Im bad at code to
> > investigate and to try to fix the issue, so i simply rolled back
> > bgpd/bgpctl
> > to 7.1 base revision and rebuild, ok now.
> > Сan't imagine what else I can do.
>
> Please try the following diff. It should fix the problem with MPLS routes.
>
> --
> :wq Claudio
>
> Index: kroute.c
> ===
> RCS file: /cvs/src/usr.sbin/bgpd/kroute.c,v
> retrieving revision 1.301
> diff -u -p -r1.301 kroute.c
> --- kroute.c18 Oct 2022 09:30:29 -  1.301
> +++ kroute.c3 Nov 2022 13:42:11 -
> @@ -580,6 +580,9 @@ krVPN4_change(struct ktable *kt, struct
> (kf->prefix.labelstack[2] << 8);
> mplslabel

bgpd VPNs broken in kroute with 7.2?

2022-10-31 Thread Bars Bars
Hi!

Just upgraded to 7.2 and bgpd began to crash with VPNs, not immediately
but in 1 minute after daemon start (probably the issue happens
when prefix withdraw received or so, and rde goes to change the fib, not
sure).
If only using IPv4 sessions and keeping VPN sessions down then it works
stable.
"
kroute_remove: not handled AID
peer closed imsg connection
SE: Lost connection to parent
peer closed imsg connection notification: Cease, administratively down
fatal in RTR: Lost connection to parent
peer closed imsg connection
fatal in RDE: Lost connection to parent
"
im not sure that is a bug, but there was huge kroute refactoring under bgpd
source tree since 7.1 and it seems that routes with VPN4/VPN6 AIDs are
now handled very differently. Im bad at code to
investigate and to try to fix the issue, so i simply rolled back
bgpd/bgpctl
to 7.1 base revision and rebuild, ok now.
Сan't imagine what else I can do.


Re: bgpd path selection seems broken at now

2020-10-16 Thread Bars Bars
HCI root hub" rev
1.00/1.00 addr 1
usb5 at uhci3: USB revision 1.0
uhub5 at usb5 configuration 1 interface 0 "Intel UHCI root hub" rev
1.00/1.00 addr 1
usb6 at uhci4: USB revision 1.0
uhub6 at usb6 configuration 1 interface 0 "Intel UHCI root hub" rev
1.00/1.00 addr 1
isa0 at pcib0
isadma0 at isa0
com0 at isa0 port 0x3f8/8 irq 4: ns16550a, 16 byte fifo
com1 at isa0 port 0x2f8/8 irq 3: ns16550a, 16 byte fifo
pckbc0 at isa0 port 0x60/5 irq 1 irq 12
pckbd0 at pckbc0 (kbd slot)
wskbd0 at pckbd0: console keyboard, using wsdisplay0
pcppi0 at isa0 port 0x61
spkr0 at pcppi0
vmm0 at mainbus0: VMX/EPT (using slow L1TF mitigation)
urndis0 at uhub4 port 2 configuration 2 interface 0 "IBM RNDIS/CDC ETHER"
rev 2.00/2.15 addr 2
urndis0: using Vendor: interface alternate setting 0 failed
uhidev0 at uhub6 port 1 configuration 1 interface 0 "Chicony HP Business
Slim Keyboard" rev 2.00/0.11 addr 2
uhidev0: iclass 3/1
ukbd0 at uhidev0: 8 variable keys, 6 key codes
wskbd1 at ukbd0 mux 1
wskbd1: connecting to wsdisplay0
uhidev1 at uhub6 port 1 configuration 1 interface 1 "Chicony HP Business
Slim Keyboard" rev 2.00/0.11 addr 2
uhidev1: iclass 3/0, 5 report ids
uhid0 at uhidev1 reportid 1: input=2, output=0, feature=0
uhid1 at uhidev1 reportid 2: input=1, output=0, feature=0
uhid2 at uhidev1 reportid 3: input=2, output=0, feature=0
uhid3 at uhidev1 reportid 5: input=0, output=0, feature=5
vscsi0 at root
scsibus3 at vscsi0: 256 targets
softraid0 at root
scsibus4 at softraid0: 256 targets
root on sd0a (1eb5f368a5140548.a) swap on sd0b dump on sd0b
bnx0: address e4:1f:13:e5:21:88
brgphy0 at bnx0 phy 1: BCM5709 10/100/1000baseT PHY, rev. 8
bnx1: address e4:1f:13:e5:21:8a
brgphy1 at bnx1 phy 1: BCM5709 10/100/1000baseT PHY, rev. 8

# bgpctl sh ip bgp VPNv4 ext rt 65000:1081 det | grep -A6 10.54.8.0/24 |
grep -Ev Nexthop
BGP routing table entry for rd 65000:1081 10.54.8.0/24
65000 64100
Origin IGP, metric 0, localpref 100, weight 0, ovs not-found, external,
valid, best
Last update: 23:24:29 ago
Communities: 65000:81
Ext. Communities: rt 65000:1081
BGP routing table entry for rd 65000:1082 10.54.8.0/24
65000 64100
Origin IGP, metric 0, localpref 90, weight 0, ovs not-found, external,
valid, best
Last update: 23:24:28 ago
Communities: 65000:82
Ext. Communities: rt 65000:1081 rt 65000:1082

I don't know how to verify the VPNv4 prefix label received with bgpd
using the bgpctl command, so i'm checking it with the 'route'.
The label 120 is one which attached to prefix with rd 65000:1082 while
advertising the prefix from the remote peer view.
That prefix i set localpref 90 with filter should be the worst prefix,
but it wins. I noticed, it seems the newest prefix always wins within VPNv4.
If I revert it to the oldest prefix (simply readvertise current the oldest
prefix),
then label changes and the newest prefix wins again.

# route -nT450 get 10.54.8.0
   route to: 10.54.8.0
destination: 10.54.8.0
   mask: 255.255.255.0
  interface: mpe450
 if address: 10.0.3.1
 mpls label: PUSH 120
   priority: 48 (bgp)
  flags: 
 use   mtuexpire
   0 0 0
sockaddrs: 


#cat /etc/bgpd-vpn450.conf
match from 192.168.0.1 ext-community rt 65000:1081 ext-community rt
65000:1082 set localpref 90
vpn "vpn1081" on mpe450 {
rd 65000:1081
import-target rt 65000:1081
export-target rt 65000:1081
network 10.0.3.0/30 set community 65001:450
fib-update yes
}


чт, 15 окт. 2020 г. в 23:50, Sebastian Benoit :

> Hi,
>
> I think it would help if you could send your configuration file, or at
> least
> the bgpctl commands that show the problem.
> Also please send a dmesg so we know what version you are running.
>
> Thanks,
> Benno
>
>
> Bars Bars(tutbara...@gmail.com) on 2020.10.12 15:10:11 +0300:
> > To be more clear i mean this one commit on rde_rib.c, but again im not
> sure.
> >
> https://cvsweb.openbsd.org/cgi-bin/cvsweb/src/usr.sbin/bgpd/rde_rib.c?rev=1.189=text/x-cvsweb-markup
> >
> >
> > , 12 ??. 2020 ??. ?? 13:52, Bars Bars :
> >
> > > Hi,
> > >
> > > Firstly, i have to say, that its really hard to understand prefix
> > > comparison procedure
> > > calls for me, because of there are so much ways where
> > > different comparisons done inside, like
> > > prefix_cmp()/prefix_compare()/pt_prefix_cmp()/path_compare()/etc. So, I
> > > guess, I may be totally wrong here in decisions...
> > >
> > > But anyway, it seems currently (at 6.6-stable and 6.7-stable, at least)
> > > that
> > > path selection is not performing at least for VPNv4 (not tried other
> AIDs
> > > yet) if we have the same
> > > prefixes with different RD while importing them to FIB. Normally, at
> this
> > 

Re: bgpd path selection seems broken at now

2020-10-12 Thread Bars Bars
To be more clear i mean this one commit on rde_rib.c, but again im not sure.
https://cvsweb.openbsd.org/cgi-bin/cvsweb/src/usr.sbin/bgpd/rde_rib.c?rev=1.189=text/x-cvsweb-markup


пн, 12 окт. 2020 г. в 13:52, Bars Bars :

> Hi,
>
> Firstly, i have to say, that its really hard to understand prefix
> comparison procedure
> calls for me, because of there are so much ways where
> different comparisons done inside, like
> prefix_cmp()/prefix_compare()/pt_prefix_cmp()/path_compare()/etc. So, I
> guess, I may be totally wrong here in decisions...
>
> But anyway, it seems currently (at 6.6-stable and 6.7-stable, at least)
> that
> path selection is not performing at least for VPNv4 (not tried other AIDs
> yet) if we have the same
> prefixes with different RD while importing them to FIB. Normally, at this
> stage there should be path selection accomplished (or ECMP multipath
> used, if supported). But instead, both prefixes are present in rdomain's
> rib,
> and the only newest one is used as active even if others have better
> attributes (route-age evaluation is not enabled by default).
> I tried to research is that caused by different rd, or by some broken
> comparison itself, but i guess that things get broken somewhere with
> commit
> evision 1.189 (tagged as "bgpd adj-rib-out rewrite") or so around,
> because of since that we have static prefix_cmp() in rde_rib translation
> unit,
> and so functions like prefix_add()/prefix_move()/prefix_update() are
> missing
> prefix_cmp() defined in rde_decide translation unit, which is actually do
> bgp path selection.
>
> The peer setup is
> ebgp-peer advertises multiple VPNv4 prefixes with different RD to the
> bgpd-peer, say,
> rd1:x.x.x.x/y
> rd2:x.x.x.x/y
> rd3:x.x.x.x/y
> where x.x.x.x/y are equal but have different as-path length, or changed
> lpref in bgpd config
> with set attributes filter. Prefix which advertised last becomes active
> prefix for the FIB.
>
> Again, may be I dont understand something, but it does not work.
>
>


bgpd path selection seems broken at now

2020-10-12 Thread Bars Bars
Hi,

Firstly, i have to say, that its really hard to understand prefix
comparison procedure
calls for me, because of there are so much ways where different comparisons
done inside, like
prefix_cmp()/prefix_compare()/pt_prefix_cmp()/path_compare()/etc. So, I
guess, I may be totally wrong here in decisions...

But anyway, it seems currently (at 6.6-stable and 6.7-stable, at least)
that
path selection is not performing at least for VPNv4 (not tried other AIDs
yet) if we have the same
prefixes with different RD while importing them to FIB. Normally, at this
stage there should be path selection accomplished (or ECMP multipath
used, if supported). But instead, both prefixes are present in rdomain's
rib,
and the only newest one is used as active even if others have better
attributes (route-age evaluation is not enabled by default).
I tried to research is that caused by different rd, or by some broken
comparison itself, but i guess that things get broken somewhere with commit
evision 1.189 (tagged as "bgpd adj-rib-out rewrite") or so around,
because of since that we have static prefix_cmp() in rde_rib translation
unit,
and so functions like prefix_add()/prefix_move()/prefix_update() are
missing
prefix_cmp() defined in rde_decide translation unit, which is actually do
bgp path selection.

The peer setup is
ebgp-peer advertises multiple VPNv4 prefixes with different RD to the
bgpd-peer, say,
rd1:x.x.x.x/y
rd2:x.x.x.x/y
rd3:x.x.x.x/y
where x.x.x.x/y are equal but have different as-path length, or changed
lpref in bgpd config
with set attributes filter. Prefix which advertised last becomes active
prefix for the FIB.

Again, may be I dont understand something, but it does not work.


Re: RT_TABLEID_MAX behavior changed?

2020-05-19 Thread Bars Bars
thanks, i got it.
so probably i should manually patch, recompile and install kernel again.
Then should i care about boot time kernel re-linking if i have such a
custom kernel?

Claudio, please, do not forget to advice with the question about
RT_TABLEID_MAX itself,
i hope i clarified you what im talking about. if not, ask me please.

вт, 19 мая 2020 г. в 15:05, Theo de Raadt :

> Bars Bars  wrote:
>
> > Thank you much.
> >
> > Do you mean i should not do syspatch if a modified kernel sources?
>
> syspatches can deliver replacements for kernel .o files
>
> So if you have changed a .h or .c file, the syspatches are not
> going to work correctly.
>
> Once you use source-code methods, you can't use those binary
> methods.
>


Re: RT_TABLEID_MAX behavior changed?

2020-05-19 Thread Bars Bars
Thank you much.

Do you mean i should not do syspatch if a modified kernel sources?
When reading KARL notes i tough that it only incompatible with kernel
changed
with config -e or kernel configuration file, which i did not modified.

What about kernel relinking at boot time?
Kernel changes was working after reboot, before applying syspatch.
So it seems at boot time relinking script took right base kernel to relink.

And as i understand, to fix current state it is enough if i
rebuild kernel only w/o userland?

вт, 19 мая 2020 г. в 12:41, Claudio Jeker :

> On Tue, May 19, 2020 at 11:21:13AM +0300, Bars Bars wrote:
> > it seems i figured out why userland was 'broken' on recompiled kernel
> > with changed RT_TABLEID_MAX.
> > I dont think things are really broken, may be i dont them right way,
> please
> > advice.
> >
> > I could reproduce the issue (all steps are done exactly as in
> openbsd.org
> > faq).
> > I changed RT_TABLEID_MAX, recompiled the kernel, booted from it, and
> change
> > didnt work on userland.
> > Then i rebuilt userland, rebooted, all works now.
> > Now if i apply some patch from errata, there is kernel re-linking done,
> and
> > just after that kernel change doesnt work.
> > Is it expected behavior? How can i fix it? syspacth -r doesnt help.
> >
>
> You can not syspatch a system with a custom kernel. You need to do apply
> the patches yourself. syspatch only works for non-modified kernels.
> It should actually check this by making sure that the kernel signature is
> correct so not sure what exactly happend but I guess you never properly
> installed your kernel including the relink directory and so syspatch
> relinked a default kernel over your modified one.
>
> > пн, 18 мая 2020 г. в 13:31, Bars Bars :
> >
> > > To be more convinient, when i said about that its limit became shorter
> its
> > > relevant to sys/net/rtable.c struct dommp.
> > >   struct dommp {
> > > unsigned int   limit;
> > > /*
> > >  * Array to get the routing domain and loopback interface
> related
> > > to
> > >  * a routing table. Format:
> > >  *
> > >  * 8 unused bits | 16 bits for loopback index | 8 bits for
> rdomain
> > >  */
> > > unsigned int  *value;
> > > };
> > >
> > > In past the maxumum value was limited to u_int16_t in some deep places,
> > > but nowadays there is only 8 bits allocated to it based on the struct
> + 8
> > > unused bits which i hop i can safely add to allocation.
> > > I worried these unused bits are not guaranteed to users, so actually
> the
> > > limit is 8 bits instead of 16 in earlier releases.
> > >
> > >
> > >
> > > пн, 18 мая 2020 г. в 11:51, Bars Bars :
> > >
> > >> Hi, Claudio
> > >>
> > >> I mean these in sys/socket.h
> > >> /*
> > >>  * Maximum number of alternate routing tables
> > >>  */
> > >> #define RT_TABLEID_MAX  8000
> > >> #define RT_TABLEID_BITS 16
> > >> #define RT_TABLEID_MASK 0x
> > >>
> > >>
> > >> пн, 18 мая 2020 г. в 10:18, Claudio Jeker :
> > >>
> > >>> On Sun, May 17, 2020 at 10:16:28PM +0300, Bars Bars wrote:
> > >>> > it seems the things work just when i rebuild userland completely
> (im
> > >>> pretty
> > >>> > sure i did it only with compiling kernel in past, correct me if i
> > >>> wrong?).
> > >>> >
> > >>> > btw, questions for the Devs.
> > >>> > Looking at the cvs history, i really worried that you do not expand
> > >>> > rt_tableid_max limit for the years, moreover now its actually 8
> bits
> > >>> > shorter than it was before loopback to rdomain map. There are many
> > >>> people
> > >>> > with more than such a number of vpns, for example if they setup
> > >>> centralized
> > >>> > vpns setup, or border inter AS router role on the box.
> > >>>
> > >>> Sorry your mail is incredibly inprecise and unclear. There is no
> > >>> rt_tableid_max in OpenBSD at least not in my tree (grep -r
> rt_tableid_max
> > >>> returned nothing). So I have no idea what you are talking about and
> am
> > >>> therefor not able to give you a better answer.
> > >>>
> > >>> > вс, 17 мая 2020 г., 10:25 Bars Bars :
> > >>> >
> > >>> > > Hey, guys.
> > >>> > >
> > >>> > > I always used the rt_tableid_max expanded to 16 bit range in past
> > >>> releases
> > >>> > > 5.x and after rebuilding the kernel it worked immediately.
> > >>> > > But now I installed 6.6 on the new system, and after changing
> > >>> > > rt_tableid_max (and new rt_tableid_mask and bits values too), my
> > >>> whole
> > >>> > > userland throw an rtable / rdomain too large error.
> > >>> > > Is there behaviour change?
> > >>> > > The only thing changed (as i know) it is news net/trable.c
> struct to
> > >>> map
> > >>> > > loopback to domain, where there is only 8 unused bits to which i
> can
> > >>> expand
> > >>> > > tableid value.
> > >>> > >
> > >>> > >
> > >>>
> > >>> --
> > >>> :wq Claudio
> > >>>
> > >>
>
> --
> :wq Claudio
>


Re: RT_TABLEID_MAX behavior changed?

2020-05-19 Thread Bars Bars
as i understand, kernel relinking currently used object file and installs
new kernel.
what i do not understand is how could it 'rollback' kernel changes done
during compilation,
if it using current object files which arre built during compilation.

Just one note, i not yet rebooted after re-linking done, because of created
domains are working (the traffic on them),
just userland does not, and im afraid they will fail to create after reboot
and i should repeat userland rebuild again.

вт, 19 мая 2020 г. в 11:21, Bars Bars :

> it seems i figured out why userland was 'broken' on recompiled kernel
> with changed RT_TABLEID_MAX.
> I dont think things are really broken, may be i dont them right way,
> please advice.
>
> I could reproduce the issue (all steps are done exactly as in openbsd.org
> faq).
> I changed RT_TABLEID_MAX, recompiled the kernel, booted from it, and
> change didnt work on userland.
> Then i rebuilt userland, rebooted, all works now.
> Now if i apply some patch from errata, there is kernel re-linking done,
> and just after that kernel change doesnt work.
> Is it expected behavior? How can i fix it? syspacth -r doesnt help.
>
>
>
> пн, 18 мая 2020 г. в 13:31, Bars Bars :
>
>> To be more convinient, when i said about that its limit became shorter
>> its relevant to sys/net/rtable.c struct dommp.
>>   struct dommp {
>> unsigned int   limit;
>> /*
>>  * Array to get the routing domain and loopback interface related
>> to
>>  * a routing table. Format:
>>  *
>>  * 8 unused bits | 16 bits for loopback index | 8 bits for rdomain
>>  */
>> unsigned int  *value;
>> };
>>
>> In past the maxumum value was limited to u_int16_t in some deep places,
>> but nowadays there is only 8 bits allocated to it based on the struct + 8
>> unused bits which i hop i can safely add to allocation.
>> I worried these unused bits are not guaranteed to users, so actually the
>> limit is 8 bits instead of 16 in earlier releases.
>>
>>
>>
>> пн, 18 мая 2020 г. в 11:51, Bars Bars :
>>
>>> Hi, Claudio
>>>
>>> I mean these in sys/socket.h
>>> /*
>>>  * Maximum number of alternate routing tables
>>>  */
>>> #define RT_TABLEID_MAX  8000
>>> #define RT_TABLEID_BITS 16
>>> #define RT_TABLEID_MASK 0x
>>>
>>>
>>> пн, 18 мая 2020 г. в 10:18, Claudio Jeker :
>>>
>>>> On Sun, May 17, 2020 at 10:16:28PM +0300, Bars Bars wrote:
>>>> > it seems the things work just when i rebuild userland completely (im
>>>> pretty
>>>> > sure i did it only with compiling kernel in past, correct me if i
>>>> wrong?).
>>>> >
>>>> > btw, questions for the Devs.
>>>> > Looking at the cvs history, i really worried that you do not expand
>>>> > rt_tableid_max limit for the years, moreover now its actually 8 bits
>>>> > shorter than it was before loopback to rdomain map. There are many
>>>> people
>>>> > with more than such a number of vpns, for example if they setup
>>>> centralized
>>>> > vpns setup, or border inter AS router role on the box.
>>>>
>>>> Sorry your mail is incredibly inprecise and unclear. There is no
>>>> rt_tableid_max in OpenBSD at least not in my tree (grep -r
>>>> rt_tableid_max
>>>> returned nothing). So I have no idea what you are talking about and am
>>>> therefor not able to give you a better answer.
>>>>
>>>> > вс, 17 мая 2020 г., 10:25 Bars Bars :
>>>> >
>>>> > > Hey, guys.
>>>> > >
>>>> > > I always used the rt_tableid_max expanded to 16 bit range in past
>>>> releases
>>>> > > 5.x and after rebuilding the kernel it worked immediately.
>>>> > > But now I installed 6.6 on the new system, and after changing
>>>> > > rt_tableid_max (and new rt_tableid_mask and bits values too), my
>>>> whole
>>>> > > userland throw an rtable / rdomain too large error.
>>>> > > Is there behaviour change?
>>>> > > The only thing changed (as i know) it is news net/trable.c struct
>>>> to map
>>>> > > loopback to domain, where there is only 8 unused bits to which i
>>>> can expand
>>>> > > tableid value.
>>>> > >
>>>> > >
>>>>
>>>> --
>>>> :wq Claudio
>>>>
>>>


Re: RT_TABLEID_MAX behavior changed?

2020-05-19 Thread Bars Bars
it seems i figured out why userland was 'broken' on recompiled kernel
with changed RT_TABLEID_MAX.
I dont think things are really broken, may be i dont them right way, please
advice.

I could reproduce the issue (all steps are done exactly as in openbsd.org
faq).
I changed RT_TABLEID_MAX, recompiled the kernel, booted from it, and change
didnt work on userland.
Then i rebuilt userland, rebooted, all works now.
Now if i apply some patch from errata, there is kernel re-linking done, and
just after that kernel change doesnt work.
Is it expected behavior? How can i fix it? syspacth -r doesnt help.



пн, 18 мая 2020 г. в 13:31, Bars Bars :

> To be more convinient, when i said about that its limit became shorter its
> relevant to sys/net/rtable.c struct dommp.
>   struct dommp {
> unsigned int   limit;
> /*
>  * Array to get the routing domain and loopback interface related
> to
>  * a routing table. Format:
>  *
>  * 8 unused bits | 16 bits for loopback index | 8 bits for rdomain
>  */
> unsigned int  *value;
> };
>
> In past the maxumum value was limited to u_int16_t in some deep places,
> but nowadays there is only 8 bits allocated to it based on the struct + 8
> unused bits which i hop i can safely add to allocation.
> I worried these unused bits are not guaranteed to users, so actually the
> limit is 8 bits instead of 16 in earlier releases.
>
>
>
> пн, 18 мая 2020 г. в 11:51, Bars Bars :
>
>> Hi, Claudio
>>
>> I mean these in sys/socket.h
>> /*
>>  * Maximum number of alternate routing tables
>>  */
>> #define RT_TABLEID_MAX  8000
>> #define RT_TABLEID_BITS 16
>> #define RT_TABLEID_MASK 0x
>>
>>
>> пн, 18 мая 2020 г. в 10:18, Claudio Jeker :
>>
>>> On Sun, May 17, 2020 at 10:16:28PM +0300, Bars Bars wrote:
>>> > it seems the things work just when i rebuild userland completely (im
>>> pretty
>>> > sure i did it only with compiling kernel in past, correct me if i
>>> wrong?).
>>> >
>>> > btw, questions for the Devs.
>>> > Looking at the cvs history, i really worried that you do not expand
>>> > rt_tableid_max limit for the years, moreover now its actually 8 bits
>>> > shorter than it was before loopback to rdomain map. There are many
>>> people
>>> > with more than such a number of vpns, for example if they setup
>>> centralized
>>> > vpns setup, or border inter AS router role on the box.
>>>
>>> Sorry your mail is incredibly inprecise and unclear. There is no
>>> rt_tableid_max in OpenBSD at least not in my tree (grep -r rt_tableid_max
>>> returned nothing). So I have no idea what you are talking about and am
>>> therefor not able to give you a better answer.
>>>
>>> > вс, 17 мая 2020 г., 10:25 Bars Bars :
>>> >
>>> > > Hey, guys.
>>> > >
>>> > > I always used the rt_tableid_max expanded to 16 bit range in past
>>> releases
>>> > > 5.x and after rebuilding the kernel it worked immediately.
>>> > > But now I installed 6.6 on the new system, and after changing
>>> > > rt_tableid_max (and new rt_tableid_mask and bits values too), my
>>> whole
>>> > > userland throw an rtable / rdomain too large error.
>>> > > Is there behaviour change?
>>> > > The only thing changed (as i know) it is news net/trable.c struct to
>>> map
>>> > > loopback to domain, where there is only 8 unused bits to which i can
>>> expand
>>> > > tableid value.
>>> > >
>>> > >
>>>
>>> --
>>> :wq Claudio
>>>
>>


segmentation fault when useing libmongoc call to strlen

2020-05-18 Thread Bars Bars
Hi,

Really often application core dumped, when sending mongodb query with some
mongoc driver writing function like insert or update, or even when
iterating cursor returned frome the mongoc driver. Moreover, it may
complete with no issues with the absolutely identical queries.

Following the gdb trace (may be question is not so correct, but) is it
possible to identify which side the issue on systems or driver?  It seems
that the issue somewhere between mongoc makes strlen call.
Sorry, if i missed something.

gdb trace here

#0 strlen () at /usr/src/lib/libc/arch/amd64/string/strlen.S:125
#1 0x01ccf0f4db22 in _mongoc_handshake_build_doc_with_application ()
from /usr/local/lib/libmongoc-1.0.so.0.0
#2 0x01ccf0f816d1 in _build_ismaster_with_handshake () from
/usr/local/lib/libmongoc-1.0.so.0.0
#3 0x01ccf0f815af in _mongoc_topology_scanner_get_ismaster () from
/usr/local/lib/libmongoc-1.0.so.0.0
#4 0x01ccf0f82c08 in _begin_ismaster_cmd () from
/usr/local/lib/libmongoc-1.0.so.0.0
#5 0x01ccf0f82a7d in mongoc_topology_scanner_node_setup_tcp () from
/usr/local/lib/libmongoc-1.0.so.0.0
#6 0x01ccf0f82203 in mongoc_topology_scanner_node_setup () from
/usr/local/lib/libmongoc-1.0.so.0.0
#7 0x01ccf0f8336b in mongoc_topology_scanner_start () from
/usr/local/lib/libmongoc-1.0.so.0.0
#8 0x01ccf0f7b2dc in mongoc_topology_scan_once () from
/usr/local/lib/libmongoc-1.0.so.0.0
#9 0x01ccf0f7b244 in _mongoc_topology_do_blocking_scan () from
/usr/local/lib/libmongoc-1.0.so.0.0
#10 0x01ccf0f7b88c in mongoc_topology_select_server_id () from
/usr/local/lib/libmongoc-1.0.so.0.0
#11 0x01ccf0f290c0 in _mongoc_cluster_select_server_id () from
/usr/local/lib/libmongoc-1.0.so.0.0
#12 0x01ccf0f24f14 in _mongoc_cluster_stream_for_optype () from
/usr/local/lib/libmongoc-1.0.so.0.0
#13 0x01ccf0f25029 in mongoc_cluster_stream_for_writes () from
/usr/local/lib/libmongoc-1.0.so.0.0
#14 0x01ccf0f2e2ad in _mongoc_collection_write_command_execute () from
/usr/local/lib/libmongoc-1.0.so.0.0
#15 0x01ccf0f30785 in mongoc_collection_remove () from
/usr/local/lib/libmongoc-1.0.so.0.0


different mongoc latest versions tried

# mongod --version
db version v3.2.22
git version: 105acca0d443f9a47c1a5bd608fd7133840a58dd
OpenSSL version: LibreSSL 3.0.2
allocator: system
modules: none
build environment:
distarch: x86_64
target_arch: x86_64

# gcc -v

Reading specs from /usr/lib/gcc-lib/amd64-unknown-openbsd6.6/4.2.1/specs
Target: amd64-unknown-openbsd6.6
Configured with: OpenBSD/amd64 system compiler
Thread model: posix
gcc version 4.2.1 20070719

# uname -a

OpenBSD m10-csvpn.rt.ru 6.6 GENERIC.MP#3 amd64


Re: RT_TABLEID_MAX behavior changed?

2020-05-18 Thread Bars Bars
To be more convinient, when i said about that its limit became shorter its
relevant to sys/net/rtable.c struct dommp.
  struct dommp {
unsigned int   limit;
/*
 * Array to get the routing domain and loopback interface related to
 * a routing table. Format:
 *
 * 8 unused bits | 16 bits for loopback index | 8 bits for rdomain
 */
unsigned int  *value;
};

In past the maxumum value was limited to u_int16_t in some deep places,
but nowadays there is only 8 bits allocated to it based on the struct + 8
unused bits which i hop i can safely add to allocation.
I worried these unused bits are not guaranteed to users, so actually the
limit is 8 bits instead of 16 in earlier releases.



пн, 18 мая 2020 г. в 11:51, Bars Bars :

> Hi, Claudio
>
> I mean these in sys/socket.h
> /*
>  * Maximum number of alternate routing tables
>  */
> #define RT_TABLEID_MAX  8000
> #define RT_TABLEID_BITS 16
> #define RT_TABLEID_MASK 0x
>
>
> пн, 18 мая 2020 г. в 10:18, Claudio Jeker :
>
>> On Sun, May 17, 2020 at 10:16:28PM +0300, Bars Bars wrote:
>> > it seems the things work just when i rebuild userland completely (im
>> pretty
>> > sure i did it only with compiling kernel in past, correct me if i
>> wrong?).
>> >
>> > btw, questions for the Devs.
>> > Looking at the cvs history, i really worried that you do not expand
>> > rt_tableid_max limit for the years, moreover now its actually 8 bits
>> > shorter than it was before loopback to rdomain map. There are many
>> people
>> > with more than such a number of vpns, for example if they setup
>> centralized
>> > vpns setup, or border inter AS router role on the box.
>>
>> Sorry your mail is incredibly inprecise and unclear. There is no
>> rt_tableid_max in OpenBSD at least not in my tree (grep -r rt_tableid_max
>> returned nothing). So I have no idea what you are talking about and am
>> therefor not able to give you a better answer.
>>
>> > вс, 17 мая 2020 г., 10:25 Bars Bars :
>> >
>> > > Hey, guys.
>> > >
>> > > I always used the rt_tableid_max expanded to 16 bit range in past
>> releases
>> > > 5.x and after rebuilding the kernel it worked immediately.
>> > > But now I installed 6.6 on the new system, and after changing
>> > > rt_tableid_max (and new rt_tableid_mask and bits values too), my whole
>> > > userland throw an rtable / rdomain too large error.
>> > > Is there behaviour change?
>> > > The only thing changed (as i know) it is news net/trable.c struct to
>> map
>> > > loopback to domain, where there is only 8 unused bits to which i can
>> expand
>> > > tableid value.
>> > >
>> > >
>>
>> --
>> :wq Claudio
>>
>


Re: RT_TABLEID_MAX behavior changed?

2020-05-18 Thread Bars Bars
Hi, Claudio

I mean these in sys/socket.h
/*
 * Maximum number of alternate routing tables
 */
#define RT_TABLEID_MAX  8000
#define RT_TABLEID_BITS 16
#define RT_TABLEID_MASK 0x


пн, 18 мая 2020 г. в 10:18, Claudio Jeker :

> On Sun, May 17, 2020 at 10:16:28PM +0300, Bars Bars wrote:
> > it seems the things work just when i rebuild userland completely (im
> pretty
> > sure i did it only with compiling kernel in past, correct me if i
> wrong?).
> >
> > btw, questions for the Devs.
> > Looking at the cvs history, i really worried that you do not expand
> > rt_tableid_max limit for the years, moreover now its actually 8 bits
> > shorter than it was before loopback to rdomain map. There are many people
> > with more than such a number of vpns, for example if they setup
> centralized
> > vpns setup, or border inter AS router role on the box.
>
> Sorry your mail is incredibly inprecise and unclear. There is no
> rt_tableid_max in OpenBSD at least not in my tree (grep -r rt_tableid_max
> returned nothing). So I have no idea what you are talking about and am
> therefor not able to give you a better answer.
>
> > вс, 17 мая 2020 г., 10:25 Bars Bars :
> >
> > > Hey, guys.
> > >
> > > I always used the rt_tableid_max expanded to 16 bit range in past
> releases
> > > 5.x and after rebuilding the kernel it worked immediately.
> > > But now I installed 6.6 on the new system, and after changing
> > > rt_tableid_max (and new rt_tableid_mask and bits values too), my whole
> > > userland throw an rtable / rdomain too large error.
> > > Is there behaviour change?
> > > The only thing changed (as i know) it is news net/trable.c struct to
> map
> > > loopback to domain, where there is only 8 unused bits to which i can
> expand
> > > tableid value.
> > >
> > >
>
> --
> :wq Claudio
>


Re: RT_TABLEID_MAX behavior changed?

2020-05-17 Thread Bars Bars
it seems the things work just when i rebuild userland completely (im pretty
sure i did it only with compiling kernel in past, correct me if i wrong?).

btw, questions for the Devs.
Looking at the cvs history, i really worried that you do not expand
rt_tableid_max limit for the years, moreover now its actually 8 bits
shorter than it was before loopback to rdomain map. There are many people
with more than such a number of vpns, for example if they setup centralized
vpns setup, or border inter AS router role on the box.


вс, 17 мая 2020 г., 10:25 Bars Bars :

> Hey, guys.
>
> I always used the rt_tableid_max expanded to 16 bit range in past releases
> 5.x and after rebuilding the kernel it worked immediately.
> But now I installed 6.6 on the new system, and after changing
> rt_tableid_max (and new rt_tableid_mask and bits values too), my whole
> userland throw an rtable / rdomain too large error.
> Is there behaviour change?
> The only thing changed (as i know) it is news net/trable.c struct to map
> loopback to domain, where there is only 8 unused bits to which i can expand
> tableid value.
>
>


RT_TABLEID_MAX behavior changed?

2020-05-17 Thread Bars Bars
Hey, guys.

I always used the rt_tableid_max expanded to 16 bit range in past releases
5.x and after rebuilding the kernel it worked immediately.
But now I installed 6.6 on the new system, and after changing
rt_tableid_max (and new rt_tableid_mask and bits values too), my whole
userland throw an rtable / rdomain too large error.
Is there behaviour change?
The only thing changed (as i know) it is news net/trable.c struct to map
loopback to domain, where there is only 8 unused bits to which i can expand
tableid value.