Re: Upgraded to 7.5: vfs.ffs.dirhash_dirsize no longer exists and large directory ere veeery slow

2024-04-11 Thread Claudio Jeker
On Thu, Apr 11, 2024 at 06:15:14PM +0200, Otto Moerbeek wrote:
> On Thu, Apr 11, 2024 at 05:29:14PM +0200, Otto Moerbeek wrote:
> 
> > On Thu, Apr 11, 2024 at 05:20:24PM +0200, Otto Moerbeek wrote:
> > 
> > > On Thu, Apr 11, 2024 at 05:08:01PM +0200, Federico Giannici wrote:
> > > 
> > > > On 4/11/24 16:15, Claudio Jeker wrote:
> > > > > On Thu, Apr 11, 2024 at 03:36:29PM +0200, Federico Giannici wrote:
> > > > > > On 4/11/24 14:12, Nick Holland wrote:
> > > > > > > On 4/11/24 05:47, Federico Giannici wrote:
> > > > > > > > We have a server with A LOT of files in some directories (an 
> > > > > > > > email
> > > > > > > > server in maildir format).
> > > > > > > > 
> > > > > > > > Since we upgraded from OpenBSD amd64 7.3 to 7.5 (passing 
> > > > > > > > through 7.4) it
> > > > > > > > became very very very slow to access these large directories!
> > > > > > > ,,,
> > > > > > > You may be being bitten by the removal of softdeps (soft updates)
> > > > > > > in 7.4 more than the availability of a knob to twist.  This was a
> > > > > > > huge hit for some things -- I had one backup job go from a couple
> > > > > > > hours to eight or so hours.  However, it turned out that increase
> > > > > > > in time has not inconvenienced me at all, and some random lockups
> > > > > > > related to softdeps have gone away.  Overall, win for me (the
> > > > > > > fscks after a lockup took hours, too, not to mention all the time
> > > > > > > and effort spent replacing part after part assuming it was a HW
> > > > > > > issue).
> > > > > > > 
> > > > > > > As I understand it...there were known (known unknown?) bugs in the
> > > > > > > softdep code, the code was ugly, and it made it difficult to
> > > > > > > actually improve the code.
> > > > > > 
> > > > > > No, we knew that softdeps were being deprecated and we removed from
> > > > > > everywhere some time ago. It must be something else.
> > > > > > 
> > > > > > Anyway, it's strange that dirhash parameters has being changed and 
> > > > > > removed
> > > > > > without any mention...
> > > > > 
> > > > > It was not (this is on -current amd64):
> > > > > vfs.ffs.dirhash_dirsize=2560
> > > > > vfs.ffs.dirhash_maxmem=5242880
> > > > > vfs.ffs.dirhash_mem=4832510
> > > > > 
> > > > > Are you sure your kernel and userland are in sync?
> > > > 
> > > > Well, I followed the prescribed procedure (I'm using OpenBSD since... 
> > > > about
> > > > 20 years).
> > > > 
> > > > In EVERY machine upgraded to 7.5 now I have something like this:
> > > > 
> > > > Elrond:/home/giannici$ sysctl vfs.ffs
> > > > 
> > > > 
> > > > 
> > > > vfs.ffs.dirhash_maxmem=2560
> > > > vfs.ffs.dirhash_mem=5242880
> > > > 
> > > > Elrond:/home/giannici$ uname -a
> > > > 
> > > > 
> > > > 
> > > > OpenBSD Elrond.neomedia.it 7.5 GENERIC.MP#82 amd64
> > > > 
> > > > 
> > > > So, I'm the only one?
> > > > 
> > > > Thanks.
> > > > 
> > > 
> > > I suspect 
> > > 
> > > In 
> > > https://cvsweb.openbsd.org/cgi-bin/cvsweb/src/sys/ufs/ffs/ffs_exatern.h.diff?r1=1.45=1.46=h
> > > 
> > > The max_softdeps entry in ffs_extern.h should have been replace by a
> > > { 0, 0 }
> > > 
> > > instead of being removed.
> > > 
> > >   -Otto
> > > 
> > 
> > Yes, that fixes it for me:
> > 
> > $ sysctl vfs.ffs
> > vfs.ffs.dirhash_dirsize=2560
> > vfs.ffs.dirhash_maxmem=5242880
> > vfs.ffs.dirhash_mem=767359
> > $
> > 
> 
> to elaborate a bit: the vfs.ffs.dirhash_maxmem enty was actually
> showing the value of the vfs.ffs.dirhash_dirsize entry. Trying to set the
> vfs.ffs.dirhash_maxmem entry would result into setting the
> vfs.ffs.dirhash_dirsize entry, which effectively disables dirhash if
> yon set it to a high value.
> 
> It remains a mystery why Claudio is seeing the correct values... you'd
> almost think he uses an old sysctl binary, or his tree is out of sync
> somehow. 
> 

My bad I ran the command on a system that was still running 7.4 without
realizing that.

-- 
:wq Claudio



Re: Upgraded to 7.5: vfs.ffs.dirhash_dirsize no longer exists and large directory ere veeery slow

2024-04-11 Thread Claudio Jeker
On Thu, Apr 11, 2024 at 03:36:29PM +0200, Federico Giannici wrote:
> On 4/11/24 14:12, Nick Holland wrote:
> > On 4/11/24 05:47, Federico Giannici wrote:
> > > We have a server with A LOT of files in some directories (an email
> > > server in maildir format).
> > > 
> > > Since we upgraded from OpenBSD amd64 7.3 to 7.5 (passing through 7.4) it
> > > became very very very slow to access these large directories!
> > ,,,
> > You may be being bitten by the removal of softdeps (soft updates)
> > in 7.4 more than the availability of a knob to twist.  This was a
> > huge hit for some things -- I had one backup job go from a couple
> > hours to eight or so hours.  However, it turned out that increase
> > in time has not inconvenienced me at all, and some random lockups
> > related to softdeps have gone away.  Overall, win for me (the
> > fscks after a lockup took hours, too, not to mention all the time
> > and effort spent replacing part after part assuming it was a HW
> > issue).
> > 
> > As I understand it...there were known (known unknown?) bugs in the
> > softdep code, the code was ugly, and it made it difficult to
> > actually improve the code.
> 
> No, we knew that softdeps were being deprecated and we removed from
> everywhere some time ago. It must be something else.
> 
> Anyway, it's strange that dirhash parameters has being changed and removed
> without any mention...

It was not (this is on -current amd64):
vfs.ffs.dirhash_dirsize=2560
vfs.ffs.dirhash_maxmem=5242880
vfs.ffs.dirhash_mem=4832510

Are you sure your kernel and userland are in sync?
-- 
:wq Claudio



Re: 7.5: Fatal errors from eigrpd

2024-04-09 Thread Claudio Jeker
This is most probably fallout from the imsg / ibuf API changes done
in 7.5. I need to setup a test system to see if I can figure out what goes
wrong.

On Mon, Apr 08, 2024 at 08:15:52PM +0200, Mark Leonard wrote:
> (Gah!  Here's the post again in plaintext.  Apologies.)
> 
> Hello all,
> 
> I'm running eigrpd in a VMWare environment and after upgrading to 7.5 from
> 7.4 I'm noticing eigrpd is failing with a couple different errors.  In 7.4
> and prior I never had any problems.
> 
> I tried to include everything that I thought might be relevant but if
> there's any other information I can provide please let me know.
> 
> Has anyone else come across anything similar?
> 
> Thanks,
> Mark
> 
> 
> 
> examples:
> 
> test1# eigrpd -dv
> startup
> eigrp_if_start: lo1 as 1 family ipv4
> eigrp_if_start: em0 as 1 family ipv4
> if_join_ipv4_group: interface em0 addr 224.0.0.10
> rt_new: prefix aa.bb.cc.1/32
> route_new: prefix aa.bb.cc.1/32 via connected distance (28160/0)
> rt_new: prefix 198.18.101.0/24
> route_new: prefix 198.18.101.0/24 via connected distance (28160/0)
> fatal in eigrpe: send_packet: get hdr failed
> rt_del: prefix aa.bb.cc.1/32
> route_del: prefix aa.bb.cc.1/32 via connected
> rt_del: prefix 198.18.101.0/24
> route_del: prefix 198.18.101.0/24 via connected
> route decision engine exiting
> kernel routing table decoupled
> waiting for children to terminate
> terminating
> 
> and
> 
> RouterTest# eigrpd -dv
> startup
> eigrp_if_start: em1 as 1 family ipv4
> if_join_ipv4_group: interface em1 addr 224.0.0.10
> rt_new: prefix 198.18.101.0/24
> route_new: prefix 198.18.101.0/24 via connected distance (28160/0)
> rt_del: prefix 198.18.101.0/24
> route_del: prefix 198.18.101.0/24 via connected
> route decision engine exiting
> kernel routing table decoupled
> waiting for children to terminate
> eigrp engine terminated; signal 11
> terminating
> 
> 
> This is happening on two of two upgraded VMs.
> 
> SHA256 (/usr/sbin/eigrpd) =
> 3b85d7ac155afe4edd355f8b1d8c81f77c6254d96410af8b22f4018b756282a6
> (just in case)
> 
> I've tried with net.inet.tcp.tso=0 and net.inet.tcp.tso=1.  Same result.
> 
> test1# uname -a
> OpenBSD test1.local 7.5 GENERIC.MP#82 amd64
> 
> The configs I'm running are pretty basic:
> 
> RouterTest# eigrpd -n
> configuration OK
> RouterTest# eigrpd -nv
> 
> 
> router-id 198.18.101.1
> fib-update yes
> rdomain 0
> fib-priority-internal 28
> fib-priority-external 28
> fib-priority-summary 28
> 
> 
> address-family ipv4 {
> autonomous-system 1 {
> k-values 1 0 1 0 0 0
> active-timeout 3
> maximum-hops 100
> maximum-paths 4
> variance 8
> default-metric 10 10 255 1 1500
> 
> 
> interface em1 {
> hello-interval 5
> holdtime 15
> delay 10
> bandwidth 10
> split-horizon yes
> }
> }
> }
> 
> 
> address-family ipv6 {
> 
> }

-- 
:wq Claudio



Re: TSO support and performance gain

2024-04-05 Thread Claudio Jeker
On Fri, Apr 05, 2024 at 05:24:27PM +, mabi wrote:
> Hi,
> 
> First thank you for another great OpenBSD release. I just updated my home 
> firewall today and was wondering about the performance of TSO support on bnxt 
> and em interfaces which have been added to the 7.5 release...
> 
> Does anyone know roughly the performance gains by having TSO support on these 
> NICs enabled?
> 

For a firewall the performance gain is around 0.

-- 
:wq Claudio



Re: crawling network with ix driver when routing trafic

2024-03-04 Thread Claudio Jeker
On Mon, Mar 04, 2024 at 11:07:37AM +1100, Aaron Mason wrote:
> Hi!
> 
> It's my understanding that the Realtek network adapters are pretty
> craptacular under load since they basically defer to the OS for
> everything, raising an interrupt each time. Try the fourth test again
> while running top and see if the interrupts (intr) spike during that
> time.
 
re(4) is not the original rl(4) that was known for being very cheap. re(4)
is decent.

-- 
:wq Claudio



Re: crawling network with ix driver when routing trafic

2024-03-04 Thread Claudio Jeker
On Sun, Mar 03, 2024 at 09:38:22PM +0100, Pierre Peyronnel wrote:
> Hey misc,
> 
> Note : I posted on this topic in r/openbsd and before I open a bug, I
> thought I'd ask you.
> 
> My OBSD router has a Realtek (onboard) and an intel (X540 pcie) network
> card, and in one particular situation I get very slow speed.
> I tested using iperf3 and also sftp put/get.
> 
> Here goes:
> (1) When I transfer from/to a host/net A to the router on re0 I get
> symmetrical 1Gbps
> (2) When I transfer from/to a host/net B to the router on ix0  I also get
> symmetrical 1Gbps
> (3) When I transfer from host/net A to host/net B through the router (re0
> -> ix0) I get 1Gbps
> (4) When I transfer from host/net B to host/net A through the router (ix0
> -> re0) I get a crawling 3Mbps
> 
> To make sure, I did a fresh install from 7.4 from scratch (okay i forgot to
> syspatch it), pfctl -d, sysctl net.ip.forwarding=1 and I got the same
> result.
> 
> When I use another OS (tried Arch linux and OPNSense) I get full 1Gbps in
> all 4 scenarios.
> 
> I'm at a loss and will appreciate any help, short of filing a bug.
> Below dmesg and pcidump.
> Thanks in advance !
> Pierre

Try to disable LRO on the ix(4) card:
ifconfig ix0 -tcplro

Also could you try -current (with and without tcplro).
-- 
:wq Claudio



Re: Programmatically add default IPv6 route

2024-02-23 Thread Claudio Jeker
On Fri, Feb 23, 2024 at 06:25:18PM +0100, Denis Fondras wrote:
> Hello,
> 
> I am trying to add IPv6 support for pppd(8) (IPv6CP) and I encounter a blocker
> when adding a default IPv6 route to PPP peer.
> 
> Feb 23 17:26:45 rt-01 pppd[64071]: Couldn't add IPv6 default route: Network 
> is unreachable
> 
> Adding the default route from route(8) works when the connection is 
> established.
> 
> From what I see with route(8), it sends the same route message as pppd(8).
> 
> From `route -v add -inet6 default fe80::ca4c:75ff:fe16:9f00%ppp0` :
> 
> ```
> RTM_ADD: Add Route: len 168, priority 0, table 0, if# 0, pid: 0, seq 1, errno > 0
> flags:
> fmask:
> use:0   mtu:0expire:0 
> locks:  inits: 
> sockaddrs: 
>  :: fe80::ca4c:75ff:fe16:9f00%ppp0 default
> ```
> 
> From pppd(8) :
> ```
> got message of size 168 on Fri Feb 23 17:26:45 2024
> RTM_ADD: Add Route: len 168, priority 0, table 0, if# 0, pid: 64071, seq 1, 
> errno 51
> flags:
> fmask:
> use:0   mtu:0expire:0 
> locks:  inits: 
> sockaddrs: 
>  :: fe80::ca4c:75ff:fe16:9f00%ppp0 default
> ```
> 
> However `route monitor -inet6` shows that the message is different when using
> route(8) :
> ```
> got message of size 288 on Fri Feb 23 17:26:22 2024
> RTM_ADD: Add Route: len 288, priority 56, table 0, if# 7, name ppp0, pid: 
> 53003, seq 1, errno 0
> flags:
> fmask:
> use:0   mtu:0expire:0 
> locks:  inits: 
> sockaddrs: 
>  :: fe80::ca4c:75ff:fe16:9f00%ppp0 :: ppp0 fe80::d925:b01f:db25:b020%ppp0 
> fe80::ca4c:75ff:fe16:9f00%ppp0
> ```
> 
> Should I also send the IFP, IFA and BRD sockaddrs from pppd(8) ?

Don't think so.

> How comes message sent from route(8) have more attributes when received by
> monitor ?

The kernel fills those in.

Make sure you encode the IPv6 link local address correctly. The stupid
kame hack will hunt you.
-- 
:wq Claudio



Re: load balancing with rdomains

2023-12-18 Thread Claudio Jeker
On Mon, Dec 18, 2023 at 01:53:50PM +0100, Marko Cupać wrote:
> On Sat, 16 Dec 2023 18:53:29 +0100
> Petr Ročkai  wrote:
> 
> > Hi,
> > 
> > On Sat, Dec 16, 2023 at 06:37:54PM +0100, Marko Cupać wrote:
> > > pass in on em0 from (em0:network) to   probability 50%
> > > rtable 1 pass in on em0 from (em0:network) to   probability
> > > 50% rtable 2
> > 
> > IIUIC these two only add up to 75% probability – you presumably want
> > probability 50% on the second of the two (the first one then being a
> > match for everything that the later rule doesn't take up).
> 
> Thank you, I can confirm that your solution:
> 
> pass in on em0 from (em0:network) to   rtable 1
> pass in on em0 from (em0:network) to   probability 50% rtable 2
> 
> ... results in what I was trying to achieve - it load balances over both
> uplinks without any blocked packets as long as both uplinks are active.
> 
> What OpenBSD FAQ https://www.openbsd.org/faq/faq6.html#Multipath says
> for a bit different scenario applies to some extent for this one as
> well:
> 
> "It's worth noting that if an interface used by a multipath route goes
> down (i.e., loses carrier), the kernel will still try to forward
> packets using the route that points to that interface. This traffic
> will of course be blackholed and end up going nowhere. It's highly
> recommended to use ifstated(8) to check for unavailable interfaces and
> adjust the routing table accordingly."

Uhm. This is not accurate. The kernel tracks interface state on routes and
will not select a multipath route that is not considered UP. There is a
smaller issue when there is no other multipath route. The lookup will
select the route and not fall back to a less specific one that is still
up.

Could please someone update the FAQ?

> ...except - if I'm not mistaken - ifstated should in this case adjust
> pf ruleset instead of routing table.
> 
> If so, would using anchors be the best way? Any working examples to
> share? I used some simple ifstated rules but it is hard to wrap my head
> around probability percentages for all the use cases - first link up,
> second down and vice versa.
> 
> Thank you in advance,
> 
> -- 
> Before enlightenment - chop wood, draw water.
> After  enlightenment - chop wood, draw water.
> 
> Marko Cupać
> https://www.mimar.rs/
> 

-- 
:wq Claudio



Re: OpenBSD SMP - BGPd - send_rtmsg: action 1, prefix A.B.C.D/24: No buffer space available - panic: malloc: out of space in kmem_map

2023-12-14 Thread Claudio Jeker
On Tue, Nov 28, 2023 at 05:55:03PM +0100, Laurent CARON wrote:
> 
> Le 28/11/2023 à 17:46, Claudio Jeker a écrit :
> > The problem is that the symbol nkmempages moved into .bss and is therefor
> > no longer modifiable by config(8). I think you can still use ukc via
> > boot -c to alter it (but that is not sticky).
> > 
> > The alternative is to set "option NKMEMPAGES=131072" in your GENERIC
> > config file (or option NKMEMPAGES_MAX=131072). See also options(4).
> > 
> > Long term is the fix this proper. All of this was built when computers had
> > 100MB of memory not 100GB.
> > 
> 
> Got it. Thanks.
> 
> It means I'll stick with this kernel for now and see if it helps (it seems
> promising for now).
> 
> Is there a way you can submit this patch (option NKMEMPAGES=131072) to the
> current branch ?

A better calculation logic for nkmempages was added to -current.
On most 64bit archs nkmempages now scales to much larger values.

See https://marc.info/?l=openbsd-cvs=170255507530513=2 for more
details.

-- 
:wq Claudio



Re: relayd https inspection certificate issue

2023-12-09 Thread Claudio Jeker
On Fri, Dec 08, 2023 at 10:04:25PM +, Philipp Benner wrote:
> Dear all,
> 
>  
> I would like to use relayd as an outbound https proxy, so I configured it 
> like shown in the last section of the relayd.conf(5) manpage.
> 
> This works fine for e.g. wikipedia.org. The certificate issued by my relay is 
> nearly the same as the original, except oft he issuer of course.
> 
> But when I try to visit e.g. heise.de, at least all images refuse to load. 
> After some research I found out the following.
> 
> When visiting the site directly without proxy, I can see that the images are 
> loaded from https://heise.cloudimg.io. If I open an image in a new browser, I 
> can also see that the certificates applicant is 2e26bae.cloudimg.io, 
> alternative applicants are heise-aws.cloudimg.io and heise.cloudimg.io
> 
> Now if I use the relayd proxy and try to open an image in a seperate browser, 
> the url is the same https://heise.cloudimg.io/... but the certificates is 
> different. Its applicant now is a.248.e.akamai.net and alternatively 
> *.akamaized.net and some other *.akamai…
> 
> So the self-issued certificate has completely other applicants than the 
> original and of course doesn‘t match the actual server name and I get the 
> error ERR_CERT_COMMON_NAME_INVALID
> 
>  
> Can anybody help or give advice?

Don't do it. This "TLS inspection" mode is broken and it is close to
impossible to fix it. The way the MITM cert is built is not smart enough
and does not consider many special cases like SAN certs and OCSP.
It works for simple things but does not work as a generic SSL interceptor.

-- 
:wq Claudio



Re: Realtek 8723BE unsupported

2023-12-04 Thread Claudio Jeker
On Mon, Dec 04, 2023 at 11:16:04AM +1000, David Gwynne wrote:
> On Sun, Dec 03, 2023 at 06:02:03PM +0100, Jan Stary wrote:
> > (please keep replies on the list)
> > 
> > On Dec 03 12:08:08, kolip...@exoticsilicon.com wrote:
> > > On Sun, Dec 03, 2023 at 02:35:11PM +0100, Jan Stary wrote:
> > > > This is current/amd64 on a HP 260 G2 mini PC (dmesg below).
> > > > Everything works, except the wifi seems to be unsupported:
> > > > 
> > > > "Realtek 8723BE" rev 0x00 at pci2 dev 0 function 0 not configured
> > > 
> > > What does pcidump -v show?
> > 
> > First of all, pcidump -v (but not pcidump) fucks up re(4):
> > 
> > rgephy0 detached
> > re0 detached
> > re0 at pci1 dev 0 function 0 "Realtek 8168" rev 0x10: RTL8168GU/8111GU 
> > (0x5080), msi, address 7c:d3:0a:21:eb:f5
> > rgephy0 at re0 phy 7: RTL8251 PHY, rev. 0
> > re0: cannot create re-stats kstat
> > rgephy0 detached
> > re0 detached
> > re0 at pci1 dev 0 function 0 "Realtek 8168" rev 0x10: RTL8168GU/8111GU 
> > (0x5080), msi, address 7c:d3:0a:21:eb:f5
> > rgephy0 at re0 phy 7: RTL8251 PHY, rev. 0
> > re0: cannot create re-stats kstat
> > 
> > Is anyone seeing that, i.e. devices detaching
> > when they are being probed by pcidump?
> > 
> > After doing the pcidump -v localy and rebooting to upload, I get this.
> > Note that the Realtek 8168 entry seems mangled (related to the above?).
> 
> pcidump causing a device to detach is a problem, but the kstat bit is a
> separate problem too.
> 
> the diff below consolidates the detach code in re(4) and adds the code
> to tear the kstat down when the device goes away.

Makes sense. One question below.
OK claudio@
 
> Index: ic/re.c
> ===
> RCS file: /cvs/src/sys/dev/ic/re.c,v
> retrieving revision 1.216
> diff -u -p -r1.216 re.c
> --- ic/re.c   10 Nov 2023 15:51:20 -  1.216
> +++ ic/re.c   4 Dec 2023 01:03:30 -
> @@ -2608,6 +2630,27 @@ freedma:
>  destroy:
>   bus_dmamap_destroy(sc->sc_dmat, re_ks_sc->re_ks_sc_map);
>  free:
> + free(re_ks_sc, M_DEVBUF, sizeof(*re_ks_sc));
> +}
> +
> +void
> +re_kstat_detach(struct rl_softc *sc)
> +{
> + struct kstat *ks = sc->rl_kstat;
> + struct re_kstat_softc *re_ks_sc;
> +
> + if (ks == NULL)
> + return;
> +
> + kstat_remove(ks);
> + re_ks_sc = ks->ks_ptr;
> + kstat_destroy(ks);
> +
> + bus_dmamap_unload(sc->sc_dmat, re_ks_sc->re_ks_sc_map);
> + bus_dmamem_unmap(sc->sc_dmat,
> + (caddr_t)re_ks_sc->re_ks_sc_stats, sizeof(struct re_stats));
> + bus_dmamem_free(sc->sc_dmat, _ks_sc->re_ks_sc_seg, 1);

Shouldn't this use re_ks_sc->re_ks_sc_nsegs?
Or actually why save re_ks_sc_nsegs when it is known to be 1?
It is just confusing to see a difference with re_kstat_attach() in this
regard.

> + bus_dmamap_destroy(sc->sc_dmat, re_ks_sc->re_ks_sc_map);
>   free(re_ks_sc, M_DEVBUF, sizeof(*re_ks_sc));
>  }
>  #endif /* NKSTAT > 0 */

-- 
:wq Claudio



Re: OpenBSD SMP - BGPd - send_rtmsg: action 1, prefix A.B.C.D/24: No buffer space available - panic: malloc: out of space in kmem_map

2023-11-28 Thread Claudio Jeker
On Tue, Nov 28, 2023 at 04:50:05PM +0100, Laurent CARON wrote:
> Le 28/11/2023 à 12:12, Claudio Jeker a écrit :
> > So the problem is that the malloc space is filled by
> > a) 26540K of devbuf -- because of the multiqueue support in ixl
> > b) 63493K of ACPI -- what the heck ACPI?!?
> > and then there is not enough space for rtable. A full table requires
> > in your example 50816K of rtable malloc space.
> > 
> > Now on amd64 all of this needs to fit into 128MB which is impossible.
> > 
> > You can use config(8) and bsd.re-config(5) to adjust the nkmempg variable
> > to something like 131072 (which is 4 times the default size).
> > This can be verified with `sysctl vm.nkmempages`
> > 
> > Now ixl(4) and ACPI should not be such pigs but in the end 128MB of kernel
> > malloc space is just stupidly small on a system with 128GB of memory.
> 
> 
> Hi Claudio,
> 
> Thanks.
> 
> I bumped nkmempg to 131072
> 
> 
> 
> # config -e -o bsd.new /bsd
> 
> ukc> nkmempg 131072
> 
> quit
> 
> 
> 
> Then rebooted with the very same issue.
> 
> It seems the nkmempg variable is not properly takes into account since
> 'sysctl vm.nkmempages' still shows 32768 after reboot
> 
> 
> 
> # sysctl vm.nkmempages vm.nkmempages=32768
> 
> 
> 
> # config -e -o bsd.new /bsd OpenBSD 7.4 (GENERIC.MP) #0: Sun Oct 22 12:13:42
> MDT 2023
> r...@syspatch-74-amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> Enter 'help' for information ukc> nkmempg nkmempages = 262144 
> 
> Modifying /etc/bsd.re-config and rebooting (twice) didn't help either.
> 
> I 'had'to recompile kernel (after modifying: /usr/src/sys/kern/kern_malloc.c
> with '#define NKMEMPAGES 262144 '), the issue is not occuring again.
> 
> Do you recomend using this approach to mitigate the issue, or is there a
> more 'long term' fix ?

The problem is that the symbol nkmempages moved into .bss and is therefor
no longer modifiable by config(8). I think you can still use ukc via
boot -c to alter it (but that is not sticky).

The alternative is to set "option NKMEMPAGES=131072" in your GENERIC
config file (or option NKMEMPAGES_MAX=131072). See also options(4).

Long term is the fix this proper. All of this was built when computers had
100MB of memory not 100GB.

-- 
:wq Claudio



Re: OpenBSD SMP - BGPd - send_rtmsg: action 1, prefix A.B.C.D/24: No buffer space available - panic: malloc: out of space in kmem_map

2023-11-28 Thread Claudio Jeker
On Mon, Nov 27, 2023 at 05:51:25PM +0100, Laurent CARON wrote:
> Please find attached the relevant info:
> 
> vmstat-m_SP_with_bgpd -> vmstat -m SP with bgpd
> 
> vmstat-m_SMP_without_bgpd -> vmstat -m SMP without bgpd
> 
> vmstat-m_SMP_with_bgpd_0{01..11} -> vmstat -m SMP with bgpd until crash.
> 
> 
> Thanks
> 
> Laurent
> 
> Le 27/11/2023 à 17:10, Claudio Jeker a écrit :
> > vmstat -m

So the problem is that the malloc space is filled by
a) 26540K of devbuf -- because of the multiqueue support in ixl
b) 63493K of ACPI -- what the heck ACPI?!?
and then there is not enough space for rtable. A full table requires
in your example 50816K of rtable malloc space.

Now on amd64 all of this needs to fit into 128MB which is impossible.

You can use config(8) and bsd.re-config(5) to adjust the nkmempg variable
to something like 131072 (which is 4 times the default size).
This can be verified with `sysctl vm.nkmempages`

Now ixl(4) and ACPI should not be such pigs but in the end 128MB of kernel
malloc space is just stupidly small on a system with 128GB of memory.
-- 
:wq Claudio



Re: OpenBSD SMP - BGPd - send_rtmsg: action 1, prefix A.B.C.D/24: No buffer space available - panic: malloc: out of space in kmem_map

2023-11-27 Thread Claudio Jeker
On Mon, Nov 27, 2023 at 04:57:35PM +0100, Laurent CARON wrote:
> Hi,
> 
> I'm currently migrating a BGPd server.
> 
> Specs of "old" machine:
> 
> - Dell R720 with Intel(R) Xeon(R) CPU E5-2637 v2and 16GB RAM
> 
> - SMP Kernel (default)
> 
> - BGPd runs fine with 5 full views
> 
> - X710 NIC (ixl) 4 port interface
> 
> Specs of "new" machine:
> 
> - Dell R750xs with Intel(R) Xeon(R) Gold 6334 CPU @ 3.60GHz and 128GB RAM
> 
> - SMP Kernel (default)
> 
> - X710 NIC (ixl) 2 nics with 2 ports each
> 
> - BGPd crashes with "panic: malloc: out of space in kmem_map" (please see
> screenshot).
> 
> - When launching 'bgpd -dv' on the console, logs are showing:
> 
> send_rtmsg: action 1, prefix 179.62.148.0/24: No buffer space available
> send_rtmsg: action 1, prefix 176.59.72.0/23: No buffer space available
> send_rtmsg: action 1, prefix 176.59.70.0/23: No buffer space available
> send_rtmsg: action 1, prefix 176.59.74.0/23: No buffer space available
> send_rtmsg: action 1, prefix 185.78.92.0/22: No buffer space available
> send_rtmsg: action 1, prefix 176.59.64.0/23: No buffer space available
> send_rtmsg: action 1, prefix 176.59.66.0/23: No buffer space available
> 
> .
> 
> send_rtmsg: action 1, prefix 31.132.21.0/24: No buffer space available
> send_rtmsg: action 1, prefix 38.94.167.0/24: No buffer space available
> 
> then the machine crashes after having processed a few thousands prefixes.
> 
> When using the SP (boot /bsd.sp) kernel, the issue doesn't arise.
> 
> Do you have any pointer to solve this issue ?

Please send vmstat -m output of the affected machine.
The problem is probably the multiqueue support in ixl(4) that consumes too
much memory.

-- 
:wq Claudio



Re: Jumbo frame, just a little late..

2023-11-07 Thread Claudio Jeker
On Tue, Nov 07, 2023 at 11:33:04AM +0100, Daniele B. wrote:
> 
> Sorry Claudio, my fault.
> 
> wiz# ifconfig reX hwfeatures
> hwfeatures= [*] hardmtu 9194
> 
> by hostname.reX: 
> 
>   wiz# nano /etc/hostname.reX:
>   inet 192.168.XXX.XXX 0xff00 mtu 9018

This is not what hostname.if documents as a correct command line.

Best is if you put mtu 9018 as a single line.

The using 'inet' line with options requires all of 'addr netmask
broadcast_addr' to be set. You miss the broadcast_addr.


>   ctrl+S; ctrl+X
> 
>   wiz# sh /etc/netstart
> 
>   ifconfig: mtu: bad value
> 
>   (same eventually at boot time)
> 
> by shell or rc.local:
> 
>   wiz# ifconfig reX mtu 9018
>(accepted)
>   wiz# ifconfig reX
> 
>   reX: flags=8843 mtu 9018
>  lladdr XX:XX:XX:XX:XX:XX
>  index 1 priority 0 llprio 3
>  groups: egress
>  media: Ethernet autoselect (1000baseT
>   full-duplex,master,rxpause,txpause) status: active inet
>   192.168.XXX.XXX netmask 0xff00 broadcast 192.168.XXX.XXX
> 
> 
> == Daniele Bonini
> 
> 
> Claudio Jeker  wrote:
> 
> > Sorry this bug report lacks all important information.
> > 
> > a) what is your hostame.mynicdevice contents
> > b) where does the error pop up? neither netstart nor ifconfig contain
> > the word "wrong"
> > c) what interface are you playing with?
> > 
> > So we can't help you.
> 

-- 
:wq Claudio



Re: Jumbo frame, just a little late..

2023-11-07 Thread Claudio Jeker
On Tue, Nov 07, 2023 at 10:21:35AM +0100, Daniele B. wrote:
> Hello,
> 
> Actually i'm not sure about the real benefits of it, and for a soho
> environment like mine but after 17 years I decided to take jumbo
> frame seriously.. and MTU values of my network equipment to 9018.
> I watched with happiness also to my old Mac having jumbo frame hard
> coded with MTU 9018 like second choise in the hardware settings.
> 
> About OpenBSD (7.3 stable) the only thing I need to ask explanation
> for is the reason of the error "wrong MTU value" popping up by setting
> jumbo frame directly via hostame.mynicdevice; when the setting go
> smoothly up via ifconfig manually or by rc.local. Is the nic device
> initialization dependent on a sane 1500 MTU value, maybe?

Sorry this bug report lacks all important information.

a) what is your hostame.mynicdevice contents
b) where does the error pop up? neither netstart nor ifconfig contain the
word "wrong"
c) what interface are you playing with?

So we can't help you.
-- 
:wq Claudio



Re: Regression (or misconfig on my side?) after OpenOSPFd upgrade (OpenBSD 7.3 -> 7.4)

2023-11-07 Thread Claudio Jeker
On Tue, Nov 07, 2023 at 08:21:16AM +0100, Laurent CARON wrote:
> Hi,
> 
> After upgrading a 7.3 to 7.4 OpenBSD box, I noticed OSPF adjacencies using a
> password are not coming up with the following in /var/log/messages:
> 
> ospfd[55040]: recv_packet: authentication error, neighbor ID X.X.X.X
> interface vlanXX
> 
> After removing the authentication, I was able to get adjacencies to come up.
> 
> Config contains:
> 
> password=""
> auth-key $password
> auth-type simple
> 
> This config was working perfectly fine until OpenBSD 7.3 but fails with 7.4
> 
> The 'only' way I found to have it working is to get rid of authentication.
> 

Ugh. My bad. I forgot that iface->auth_key is not really a string. So the
code setting the auth_key would copy too much if you use a password with 8
chars. Using a password with 7 or less chars works fine.

As a result of this overflow the checksum calculation in auth_validate
fails and that's what you see.

Diff below should fix this.
-- 
:wq Claudio

Index: auth.c
===
RCS file: /cvs/src/usr.sbin/ospfd/auth.c,v
diff -u -p -r1.22 auth.c
--- auth.c  3 Jul 2023 09:40:47 -   1.22
+++ auth.c  7 Nov 2023 09:56:44 -
@@ -166,7 +166,8 @@ auth_gen(struct ibuf *buf, struct iface 
fatalx("auth_gen: ibuf_set failed");
 
if (ibuf_set(buf, offsetof(struct ospf_hdr, auth_key),
-   iface->auth_key, strlen(iface->auth_key)) == -1)
+   iface->auth_key, strnlen(iface->auth_key,
+   sizeof(iface->auth_key))) == -1)
fatalx("auth_gen: ibuf_set failed");
break;
case AUTH_CRYPT:



Re: What could cause high CPU load averages (no actual CPU usage)?

2023-10-27 Thread Claudio Jeker
On Fri, Oct 27, 2023 at 01:54:28AM +0200, Justin Yates Fletcher wrote:
> On Wed, 2023-10-25 at 20:25 -0400, Raul Miller wrote:
> > On Wed, Oct 25, 2023 at 8:16 PM Justin Yates Fletcher
> >  wrote:
> > > On Wed, 2023-10-25 at 21:12 +0200, Mike Fischer wrote:
> > > > 
> > > > > Am 25.10.2023 um 17:57 schrieb Theo de Raadt
> > > > > :
> > > > > Mike Fischer  wrote:
> > > > > > > Am 25.10.2023 um 17:29 schrieb Theo de Raadt
> > > > > We changed a lot of kernel scheduling code *without giving a
> > > > > damn
> > > > > about the stability of this number*
> > > > 
> > > > Fine, but you are not changing my running Kernel, are you?
> > > 
> > > I don't understand your point with this. Are you making an
> > > accusation?
> > > If not, then why even write this?
> > 
> > I think Mike Fischer's point was that the change did not correspond
> > to
> > a kernel upgrade.
> > 
> 
> It is hyperbole or accusational... or somewhere on that spectrum.
> Either way, it serves no valuable purpose, so why even write that?
> 
> Also, there was a kernel change: 7.4. Pretty sure that was mentioned.
> 
> 
> > (And I think Theo de Raadt's point was that there's not enough rigor
> > on load average to diagnose this issue.)
> > 
> 
> Theo's point, as I read it, was just that the load average is
> calculated in the same way as before, even though there have been
> changes in other parts of the system that could affect it.   

Just to be clear. There was a change in how the load avarage is
calculated. So it may cause differences in numbers. Do we care about that?
No because it was done to be able to work on more important projects.
 
> It has nothing to do with rigor. The OS could just always report 0.0. 
> If you start artifically changing a metric, for the sake of rigor, then
> that metric is no longer valuable:
> 
> https://en.wikipedia.org/wiki/Goodhart%27s_law
> 
> Changing how a mertic is calculated to meet a target certainly reduces
> the value of the metric, right?

I do not agree. The load avarage has some value but most people do not
understand how it is calculated and what a significant change is.
Also systems change, so metrics change all the time. They still offer a
good value.

-- 
:wq Claudio



Re: What could cause high CPU load averages (no actual CPU usage)?

2023-10-25 Thread Claudio Jeker
On Wed, Oct 25, 2023 at 05:24:50PM +0200, Mike Fischer wrote:
> 
> > Am 25.10.2023 um 17:07 schrieb Theo de Raadt :
> > 
> > Claudio Jeker  wrote:
> > 
> >> On Wed, Oct 25, 2023 at 11:57:54AM +0200, Mike Fischer wrote:
> >>> I have been observing occasional bouts of high load averages on several
> >>> servers I administer and I am trying to find the cause. (I monitor these
> >>> machines so that I can implement corrective measures in case of any
> >>> malicious or abnormal activity. I think this is benign, but I’d still
> >>> like to find the cause.)
> >>> 
> >>> Once the high load average starts, only a reboot seems to (temporarily)
> >>> return the values to their normal levels.
> >>> 
> >>> The actual CPU usage (as measured by vmstat) stays low even if the load
> >>> average is elevated.
> >>> 
> >>> The servers are VMs running on a VMWare host (ESXi). This was seen with
> >>> OpenBSD 7.3 and 7.4 amd64.
> >>> 
> >>> I can not determine anything inside the VM that causes this. There seems
> >>> to be no correlation to pfstat(8) graphs, log entries, known events, or
> >>> anything else I can determine. restarting all of the rc.d services never
> >>> made any difference.
> >>> 
> >>> Could this be caused by something on the VMWare host machine? (The host
> >>> seems to be operating at limit regarding RAM for example. But the VM is
> >>> only using the normal percentage of its allocated RAM — way below 100%
> >>> and very constant usage, no swap.)
> >>> 
> >>> How can I further debug this, keeping in mind that these are production
> >>> machines and experimentation is limited to benign things that don’t
> >>> cause outages.
> >>> 
> >> 
> >> What is high? A high CPU load for me is in the order of 70+.
> >> Please remember the CPU load avarage is a horrible leftover from tenex
> >> days. The system just counts how many processes are runnable but it is a
> >> very bad indicator of actual CPU load.
> > 
> > Furthermore, every operating system counts this in a different way.
> > You might think there is only one way to count it.  Not at all.
> 
> True. But like I said, this was noticed because of the sudden increase
> on the same (OpenBSD) machine without any obvious reason. I am not
> implying that the value of 0.7 is in any way critical. Just that an
> increase from a long time load average of 0.0x to 0.7x is noteworthy. I
> have no issue when the load increases when a machine is handling
> requests or doing something I know about. But then the load should drop
> back to normal levels once the task is finished. That did not happen in
> the cases I’m trying to figure out.

I process that is started every 5 seconds and exits after 10ms
computation can cause the load to go up by 1. It just matters if it runs
during the sampling time or not.  This is why the load avarage is not
accurate, it is an indication and if the value is below the number of CPUs
you may well see quantization errors.

So yes, maybe there is something going on but even top -s .1 -I will have a
hard time to show it to you. It may be to small of a blib to spot.
-- 
:wq Claudio



Re: What could cause high CPU load averages (no actual CPU usage)?

2023-10-25 Thread Claudio Jeker
On Wed, Oct 25, 2023 at 11:57:54AM +0200, Mike Fischer wrote:
> I have been observing occasional bouts of high load averages on several
> servers I administer and I am trying to find the cause. (I monitor these
> machines so that I can implement corrective measures in case of any
> malicious or abnormal activity. I think this is benign, but I’d still
> like to find the cause.)
> 
> Once the high load average starts, only a reboot seems to (temporarily)
> return the values to their normal levels.
> 
> The actual CPU usage (as measured by vmstat) stays low even if the load
> average is elevated.
> 
> The servers are VMs running on a VMWare host (ESXi). This was seen with
> OpenBSD 7.3 and 7.4 amd64.
> 
> I can not determine anything inside the VM that causes this. There seems
> to be no correlation to pfstat(8) graphs, log entries, known events, or
> anything else I can determine. restarting all of the rc.d services never
> made any difference.
> 
> Could this be caused by something on the VMWare host machine? (The host
> seems to be operating at limit regarding RAM for example. But the VM is
> only using the normal percentage of its allocated RAM — way below 100%
> and very constant usage, no swap.)
> 
> How can I further debug this, keeping in mind that these are production
> machines and experimentation is limited to benign things that don’t
> cause outages.
> 

What is high? A high CPU load for me is in the order of 70+.
Please remember the CPU load avarage is a horrible leftover from tenex
days. The system just counts how many processes are runnable but it is a
very bad indicator of actual CPU load.

-- 
:wq Claudio



Re: Question about rdomains/rtables

2023-10-24 Thread Claudio Jeker
On Mon, Oct 23, 2023 at 06:08:37PM +0200, tetrosalame wrote:
> Hello misc,
> 
> I'm playing with rdomain/rtable on OpenBSD 7.4 and I'm a bit confused about
> the relation between rdomains and rtables.
> 
> If I got rdomain(4) right, the two facilities are designed so that a rdomain
> can hold 0-255 rtables. Even rdomain 0 -no rdomain configured- can hold
> several rtables. IP addresses can overlap if configured in different
> rdomains.

No, this is not right. rtables are part of rdomains. So rdomain 0 has
rtable 0. rdomain 1 uses rtable 1. rdomain 2 uses rtable 2 and so on.

Now it is possible to assign an extra rtable to an rdomain but as you
found out there is no tool right now to allow this for any rdomain != 0.

Doing this properly would probably require some new route(4) messages so
that userland daemons can act on this as well. I never really needed this
flexibility so I never implemented it.
 
> In my mind the design is somehow "hierarchical"
> 
> rdomain 0
> |--> rtable 0
> |--> rtable 1
> |...
> |--> rtable 255
> 
> rdomain 1
> |--> rtable 0
> |--> rtable 1
> |...
> |--> rtable 255
> 
> but in practice, since there's no utility to add more rtables beyond the
> default one per rdomain, in the current implementation OS tools (pf, route,
> ifconfig, daemons etc...) take advantage of these facilities in a "flat"
> way:
> 
> rdomain 0
> |--> rtable 0
> 
> rdomain 1
> |--> rtable 0

This is a wrong view. The system has 255 rtables. You can make an rtable
an rdomain when the rtable is using itself to lookup link local addresses.

So the visualisation is the other way around:

rtable 0 => rdomain 0
rtable 1 => rdomain 1
rtable 2 => rdomain 2
...
rtable 42 => rdomain 0
...

In this case the tables 0, 1, 2 are rdomains while table 42 is just an
alternate routing table for rdomain 0.

> 
> and so on, where rtables are numbered after their containing rdomain.
> Documentation refers to rdomains when it's appropriate to think about a
> logical segment of the routing space, while it refers to rtables when the
> concept is "do something with routing table number XXX".
> 
> So while in theory one should think about rdomains first and then about the
> rtables that belong to each of them, in current usage they're the same
> thing: $tool -T $number and don't bother.
> 
> But...I read the slides presented by Peter Hessler (thank you) at EuroBSD
> 2012 and everything was clear...well, until I came to slide 16 and pf
> ruleset "pass in on rdomain 2 rtable 4" (1). I'm puzzled: how can I "create"
> rtable 4 inside rdomain 2?

That rule matches packets on rdomain 2 and uses rtable 4 (which can be an
rdomain) to forward the packets.
 
> Thanks and I apologize for my lack of brevity.
> 
> f.
> 
> 1:
> https://www.openbsd.org/papers/eurobsd2012/phessler-rdomains/mgp00016.html
> 

-- 
:wq Claudio



Re: Default rdomain for CLI commands

2023-10-24 Thread Claudio Jeker
On Tue, Oct 24, 2023 at 08:39:33AM -, Stuart Henderson wrote:
> On 2023-10-24, Andy Lemin  wrote:
> > Hi all,
> >
> > Just a quick question.
> >
> > I have multiple rdomains. My outside rdomain (rdomain 0) has a single 
> > default route to my ISP. And my internal rdomain 9 has multiple default 
> > routes pointing to various pairX interfaces for some funky routing stuff.
> >
> > Everything works beautifully, however, every command I type on the box 
> > locally or over SSH which needs internet for example, is being executed 
> > under the internal rdomain, not the edge rdomain.
> >
> > So I have to run;
> > ‘route -T0 exec syspatch’ for example.
> >
> > How do I set/override the default rdomain for system level CLI commands?
> 
> The basic answer to your question is "set rtable in login.conf for the
> relevant class". But that doesn't explain why your machine is not already
> using rtable 0..
> 

Because I think login.conf(5) is wrong. The default rtable is not 0. If
rtable is not set the current rtable is not modified by login_cap(3).

-- 
:wq Claudio

Index: login.conf.5
===
RCS file: /cvs/src/share/man/man5/login.conf.5,v
retrieving revision 1.70
diff -u -p -r1.70 login.conf.5
--- login.conf.531 Mar 2022 17:27:23 -  1.70
+++ login.conf.524 Oct 2023 08:41:21 -
@@ -284,7 +284,7 @@ Initial priority (nice) level.
 Require home directory to login.
 .\"
 .Pp
-.It rtable Ta number Ta Dv 0 Ta
+.It rtable Ta number Ta "" Ta
 Rtable to be set for the class.
 .\"
 .Pp



Re: Default rdomain for CLI commands

2023-10-24 Thread Claudio Jeker
On Tue, Oct 24, 2023 at 06:56:33PM +1100, Andy Lemin wrote:
> Hi Lyndon,
> That is a good trick, I will try that.
> 
> But it is more of an unexpected nuisance as I’m expecting the default to
> be rdomain 0.

No rdomains are inherited. Once a process runs in rdomain X all childs
will also be in rdomain X. With this logging in via sshd will inherit the
rdomain of the sshd process.

Now you could look into login.conf(5) and try forcing rtable to 0 for your
login class. If the login respects the settings you will get rdomain 0 all
the time.
 
> It seems to switch to use the rdomain with the most default routes which
> breaks things unexpectedly - for example many crontab commands break
> after adding routes, so now have to _always_ prefix with route -T0 exec
> (to support automated route changes etc).

No it does not.
 
> This must be unexpected behaviour to change dynamically like this?

There is no dynamic change. As said the rdomain is inherited over fork.
It is set probably by the rc.d script and from there on it sticks to that.
 
> Thanks for your help, Andy.
> 
> 
> > On 24 Oct 2023, at 14:09, Lyndon Nerenberg (VE7TFX/VE6BBM) 
> >  wrote:
> > 
> > Andy Lemin writes:
> > 
> >> So I have to run;
> >> ‘route -T0 exec syspatch’ for example.
> >> 
> >> How do I set/override the default rdomain for system level CLI commands?
> > 
> > If you're talking about running a bunch of interactive shell commands
> > in rdomain 0, just 'route -T0 exec sh' to drop into a sub-shell in
> > rdomain 0.
> > 
> > --lyndon
> 

-- 
:wq Claudio



Re: OpenBSD 7.3 found a process with PID 0

2023-09-27 Thread Claudio Jeker
On Tue, Sep 26, 2023 at 06:12:20PM +0200, Alessandro Baggi wrote:
> 
> 
> Il 26/09/23 17:30, Claudio Jeker ha scritto:
> > On Tue, Sep 26, 2023 at 05:13:46PM +0200, Andreas Kähäri wrote:
> > > On Tue, Sep 26, 2023 at 04:59:22PM +0200, Alessandro Baggi wrote:
> > > > Hi list,
> > > > running this python3 script:
> > > > 
> > > > #!/usr/bin/env python3
> > > > import psutil
> > > > 
> > > > pids = psutil.pids()
> > > > for i in pids:
> > > >  p = psutil.Process(i)
> > > >  with p.oneshot():
> > > >  print(str(i) + " " + p.name())
> > > > 
> > > > The result start with:
> > > > 
> > > > 0 swapper
> > > > 1 init
> > > > 536 smtpd
> > > > 868 ksh
> > > > ...
> > > > 
> > > > This process does not appear in ps, top and htop.
> > > 
> > > $ ps -p 0
> > >PID TT  STATTIME COMMAND
> > >0 ??  DK   0:02.19 (swapper)
> > > 
> > > For top, you need to press S to show system processes.  I don't use
> > > htop, but I assume it has a similar capability to show system processes.
> > > 
> > > > 
> > > > How could be that there is a process with PID 0 before init?
> > > > Probably I'm missing something about OpenBSD core.
> > > > 
> > > > Can someone point me in the right direction?
> > > > 
> > > 
> > > See uvm_init(9):
> > > 
> > >   The swapper process swaps in runnable processes that are
> > >   currently swapped out, if there is room.
> > > 
> > 
> > ... and this is a lie. The swapper process does nothing.
> > 
> 
> Ok, but why it is running?

Because it is the main() thread and nobody cleaned up that mess.

-- 
:wq Claudio



Re: OpenBSD 7.3 found a process with PID 0

2023-09-26 Thread Claudio Jeker
On Tue, Sep 26, 2023 at 05:13:46PM +0200, Andreas Kähäri wrote:
> On Tue, Sep 26, 2023 at 04:59:22PM +0200, Alessandro Baggi wrote:
> > Hi list,
> > running this python3 script:
> > 
> > #!/usr/bin/env python3
> > import psutil
> > 
> > pids = psutil.pids()
> > for i in pids:
> > p = psutil.Process(i)
> > with p.oneshot():
> > print(str(i) + " " + p.name())
> > 
> > The result start with:
> > 
> > 0 swapper
> > 1 init
> > 536 smtpd
> > 868 ksh
> > ...
> > 
> > This process does not appear in ps, top and htop.
> 
> $ ps -p 0
>   PID TT  STATTIME COMMAND
>   0 ??  DK   0:02.19 (swapper)
> 
> For top, you need to press S to show system processes.  I don't use
> htop, but I assume it has a similar capability to show system processes.
> 
> > 
> > How could be that there is a process with PID 0 before init?
> > Probably I'm missing something about OpenBSD core.
> > 
> > Can someone point me in the right direction?
> > 
> 
> See uvm_init(9):
> 
>  The swapper process swaps in runnable processes that are
>  currently swapped out, if there is room.
> 

... and this is a lie. The swapper process does nothing.

-- 
:wq Claudio



Re: IPv6 link-local addresses outside of fe80::/64 are not handled correctly

2023-07-12 Thread Claudio Jeker
On Wed, Jul 12, 2023 at 10:59:13AM -0600, Zack Newman wrote:
> On 7/12/23 10:20, Claudio Jeker wrote:
> > You are missing something. It is called the KAME hack or embedded scope.
> > The KAME IPv6 implementation hijacks the 2nd 16bit addr part to store the
> > scope_id.  In some cases this embedded scope escapes in the addrs printed.
> > Especially the "ndp info overwritten for" is leaking the scope_id (4)
> > which is probably the interface index of your em0 interface.
> > 
> > Welcome to IPv6, the world would be better without all the garbage.
> 
> As predicted, em0 does in fact have index 4. Follow up question. Am I
> to interpret this as purely a display problem and not a functional one?

Depends. It is mostly a display issue until it isn't. The above is a
display issue.

> If so, can you explain why when I have the following rule in pf.conf(5):
> 
> block out quick on $wan inet6 to fe80:4::c6ca:2bff:fe5a:8723%em0
> 
> I am still able to ping6(8) it:
> 
> router$ ping6 -c1 fe80:4::c6ca:2bff:fe5a:8723%em0
> PING fe80:4::c6ca:2bff:fe5a:8723%em0 (fe80:4::c6ca:2bff:fe5a:8723%em0): 56 
> data bytes
> 64 bytes from fe80::c6ca:2bff:fe5a:8723%em0: icmp_seq=0 hlim=64 time=7.294 ms
> 
> --- fe80:4::c6ca:2bff:fe5a:8723%em0 ping statistics ---
> 1 packets transmitted, 1 packets received, 0.0% packet loss
> round-trip min/avg/max/std-dev = 7.294/7.294/7.294/0.000 ms
> 
> meanwhile if I remove the "4", I am unable to ping6(8) it?:

Because the two addresses are not the same (in some cases).
Confusing? Yes it is.
 
> router$ ping6 -c1 fe80:4::c6ca:2bff:fe5a:8723%em0
> PING fe80:4::c6ca:2bff:fe5a:8723%em0 (fe80:4::c6ca:2bff:fe5a:8723%em0): 56 
> data bytes
> ping6: sendmsg: Permission denied
> ping: wrote fe80:4::c6ca:2bff:fe5a:8723%em0 64 chars, ret=-1
> 
> --- fe80:4::c6ca:2bff:fe5a:8723%em0 ping statistics ---
> 1 packets transmitted, 0 packets received, 100.0% packet loss
> 
> I should add that I can replace the second octet pair with any non-zero
> value, and I am unable to block it. Asked differently, how would I be
> able to block traffic to/from fe80:4::c6ca:2bff:fe5a:8723%em0 while
> still allowing traffic to/from fe80::c6ca:2bff:fe5a:8723%em0 where "4"
> is interpreted as not the scope_id but in fact part of the address since
> seemingly "%em0" is sufficient without scope_id?

You can't (they are the same).

-- 
:wq Claudio



Re: IPv6 link-local addresses outside of fe80::/64 are not handled correctly

2023-07-12 Thread Claudio Jeker
On Wed, Jul 12, 2023 at 08:23:36AM -0600, Zack Newman wrote:
> Before I raise a bug report, I wanted to pass it by @misc in case I'm
> confused. It appears there is an issue with link-local addresses at
> least as far as route(8) is concerned. Since May 2, /var/log/messages
> has been getting spammed with the following:
> 
> router$ tail -6 /var/log/messages
> Jul 12 03:02:47 router /bsd: ndp info overwritten for 
> fe80:4::c6ca:2bff:fe5a:cf35 by c4:ca:2b:5a:cf:35 on em0
> Jul 12 03:02:51 router /bsd: ndp info overwritten for 
> fe80:4::c6ca:2bff:fe5a:cf35 by 00:1c:73:00:00:99 on em0
> Jul 12 04:57:30 router /bsd: ndp info overwritten for 
> fe80:4::c6ca:2bff:fe5a:8723 by c4:ca:2b:5a:87:23 on em0
> Jul 12 04:57:34 router /bsd: ndp info overwritten for 
> fe80:4::c6ca:2bff:fe5a:8723 by 00:1c:73:00:00:99 on em0
> Jul 12 06:16:31 router /bsd: ndp info overwritten for 
> fe80:4::c6ca:2bff:fe5a:cf35 by c4:ca:2b:5a:cf:35 on em0
> Jul 12 06:16:35 router /bsd: ndp info overwritten for 
> fe80:4::c6ca:2bff:fe5a:cf35 by 00:1c:73:00:00:99 on em0
> 
> The MAC address 00:1c:73:00:00::99 belongs to the gateway on my ISP's
> side. I have no clue about the other 2 MAC addresses. Anyway, when
> trying to investigate the matter, I found that link-local
> addresses (i.e., fe80::/10) that are not part of fe80::/64, the only
> block that is actually defined to be used per RFC 4291 Section 2.5.6,
> always have the second octet pair as 0:
> 
> router$ route -n get fe80:4::c6ca:2bff:fe5a:cf35%em0 -inet6
>route to: fe80::c6ca:2bff:fe5a:cf35%em0
> destination: fe80::c6ca:2bff:fe5a:cf35%em0
>mask: :::::::
>   interface: em0
>  if address: fe80::7ec2:55ff:fe62:31fb%em0
>priority: 3 ()
>   flags: 
>  use   mtuexpire
>   34 0 85085
> 
> Notice how "route to" does not have the same IP as the IP I passed to
> route(8). Here is another example with a "random" link-local IP:
> 
> router$ route -n get fe80:4:8349:adfe:1ca:2eff:95a:14%em0 -inet6
>route to: fe80:0:8349:adfe:1ca:2eff:95a:14%em0
> destination: fe80::
>mask: ffc0::
> gateway: ::1
>   interface: lo0
>  if address: ::1
>priority: 8 (static)
>   flags: 
>  use   mtuexpire
>   27 32768 0
> 
> Is there something I am missing, or is this a bug?

You are missing something. It is called the KAME hack or embedded scope.
The KAME IPv6 implementation hijacks the 2nd 16bit addr part to store the
scope_id.  In some cases this embedded scope escapes in the addrs printed.
Especially the "ndp info overwritten for" is leaking the scope_id (4)
which is probably the interface index of your em0 interface.

Welcome to IPv6, the world would be better without all the garbage.
-- 
:wq Claudio



Re: APCI on old Thinkpad

2023-07-03 Thread Claudio Jeker
Also keep in mind that laptops that old most often had bad or broken early
ACPI implementations and it was better to not enable ACPI on those.
Normally there was some BIOS knob to just use apm(4) which often worked
much better.

On Mon, Jul 03, 2023 at 08:58:45PM +0200, Daniele B. wrote:
> Thanks Peter, point got.
> 
> I also go ahead with very old hardware, kind of 10 years old minipc/pc 
> (including a Mac Pro).. and
> we are in so good habits with our OpenBSD os that we tend to think no problem 
> will never arise.
> Saddly enough we maybe forget what is really feasible..
> 
> 
> -- Daniele Bonini
> 
> 
> Jul 3, 2023 14:47:57 Peter N. M. Hansteen :
> 
> > On Mon, Jul 03, 2023 at 01:36:10PM +0200, Michael Hekeler wrote:
> >> oh dear I have forgotten the model number - Sorry!
> >> 
> >> It is Thinkpad 570
> > 
> > I had to look this up, since I had forgotten that Thinkpads used to come
> > with model numbers not prefixed and/or postfixed with letters.
> > 
> > I think one of several issues you will bump into is that the machine is
> > almost a quarter century old (released April 1999 if Wikipedia is to be 
> > trusted),
> > and you may be one of fairly few people who have kept one around this long.
> > 
> > This means in practice that in all likelihood, recent versions of any 
> > now-useful
> > software has been only lightly tested (if at all) on that vintage hardware.
> > 
> > If you can get someone with the right skillset interested (as in, not me, by
> > any measure) it is conceivable that a fix is within reach. That said, 
> > however,
> > I suspect that improving support for more current hardware would tend to
> > take priority when developers decide what to spend their time on.
> > 
> > All the best,
> > Peter
> > 
> > -- 
> > Peter N. M. Hansteen, member of the first RFC 1149 implementation team
> > https://bsdly.blogspot.com/ https://www.bsdly.net/ https://www.nuug.no/
> > "Remember to set the evil bit on all malicious network traffic"
> > delilah spamd[29949]: 85.152.224.147: disconnected after 42673 seconds.
> 

-- 
:wq Claudio



Re: relayd: pfe_route: failed to add gateway 22 Invalid argument

2023-06-29 Thread Claudio Jeker
On Thu, Jun 29, 2023 at 09:34:10AM +0200, Jörg Streckfuß wrote:
> Hi Claudio,
> 
> Am 29.06.23 um 09:01 schrieb Claudio Jeker:
> > On Thu, Jun 29, 2023 at 08:53:05AM +0200, Jörg Streckfuß wrote:
> > > 
> > > Hi list,
> > > 
> > > here is a small addition. Adding and deleting the route to and from 
> > > routing
> > > table on the command line works as expected:
> > > 
> > > fw1 # route add 2001:::::4/128 2001:::::4 -label
> > > geo_service
> > > add host 2001:::::4/128: gateway 2001:::::4
> > > 
> > > fw# route -n show -inet6 | grep 2001:::::4
> > > 2001:::::4  52:01:8d:e4:fd:63  UHLch
> > > 123015 - 3 vlan18
> > > 2001:::::4  2001:::::4  UGHS
> > > 00 - 8 vlan18
> > > 
> > > fw1 # route del 2001:::::4/128 2001:::::4 -label
> > > geo_service
> > > del host 2001:::::4/128: gateway 2001:::::4
> > > 
> > > fw1# route -n show -inet6 | grep 2001:::::4
> > > 
> > > 2001:638:dfce:3000::4  52:01:8d:e4:fd:63  UHLc
> > >  0
> > > 23015 - 3 vlan18
> > > 
> > > 
> > > Why can't relayd add the route to the table and what does the following 
> > > log
> > > concretely mean:
> > > 
> > > 
> > > pfe_route: failed to add gateway 2001:638:dfce:3000::4: 22 Invalid 
> > > argument
> > > 
> > > 
> > 
> > Run route -n monitor will give you more insights at what is sent to the
> > kernel. At least unless the route message is so mangled that the kernel
> > fails to parse it.
> 
> This is interesting. I ran route -n monitor and run relayd but it says
> nothing. No output at all.
 
Ugh, relayd pfe_route is utterly broken for IPv6. That code never worked,
the encoded message is not properly aligned because struct sockaddr_in6
does not align to a long word boundary.

This function requires a rewrite.
-- 
:wq Claudio



Re: relayd: pfe_route: failed to add gateway 22 Invalid argument

2023-06-29 Thread Claudio Jeker
On Thu, Jun 29, 2023 at 08:53:05AM +0200, Jörg Streckfuß wrote:
> 
> Hi list,
> 
> here is a small addition. Adding and deleting the route to and from routing
> table on the command line works as expected:
> 
> fw1 # route add 2001:::::4/128 2001:::::4 -label
> geo_service
> add host 2001:::::4/128: gateway 2001:::::4
> 
> fw# route -n show -inet6 | grep 2001:::::4
> 2001:::::4  52:01:8d:e4:fd:63  UHLch
> 123015 - 3 vlan18
> 2001:::::4  2001:::::4  UGHS
> 00 - 8 vlan18
> 
> fw1 # route del 2001:::::4/128 2001:::::4 -label
> geo_service
> del host 2001:::::4/128: gateway 2001:::::4
> 
> fw1# route -n show -inet6 | grep 2001:::::4
> 
> 2001:638:dfce:3000::4  52:01:8d:e4:fd:63  UHLc 0
> 23015 - 3 vlan18
> 
> 
> Why can't relayd add the route to the table and what does the following log
> concretely mean:
> 
> 
> pfe_route: failed to add gateway 2001:638:dfce:3000::4: 22 Invalid argument
> 
> 

Run route -n monitor will give you more insights at what is sent to the
kernel. At least unless the route message is so mangled that the kernel
fails to parse it.

Also  all IPs does not help to understand what is going on.
Why do you add /128 route with the same IP as the gateway? That just makes
no sense.

-- 
:wq Claudio



Re: mp-safe tun

2023-06-26 Thread Claudio Jeker
On Mon, Jun 26, 2023 at 03:21:26PM +, Valdrin MUJA wrote:
> Hello OpenBSD,
> 
> I've been thinking about this since OpenBSD devs do a lot of mp-safe on the 
> network stack:
> Is it possible to make /dev/tun device mp-safe/Multi-queue?

It is rather complicated to do mainly because a large part of the vnode
subsystem needs to be save enough to run without the kernel lock.
I tired it and while it kind of works it may also be kind of too
optimistic.

-- 
:wq Claudio



Re: latest amd64 snap hangs on "root on sd0a..."

2023-06-22 Thread Claudio Jeker
On Wed, Jun 21, 2023 at 07:27:44PM +0300, Mikhail wrote:
> Just installed latest amd64 install73.img from cdn.openbsd.org
> 
> OpenBSD 7.3-current (GENERIC.MP) #1253 Tue Jun 20 13:52:16 MDT 2023
> 
> and after installation it can't proceed further than
> 
> root on sd0a (..) swap on sdb0b dump on sd0b
> 
> Was working fine couple days ago.
> 
> Is it known issue?

Could you please share a dmesg of your system? 

Thanks
-- 
:wq Claudio



Re: latest amd64 snap hangs on "root on sdoa..."

2023-06-21 Thread Claudio Jeker
On Wed, Jun 21, 2023 at 01:03:03PM -0600, Chris Waddey wrote:
> Sorry for breaking the thread, I wasn't subscribed to misc, but found
> this in the archives.
> 
> After some testing, it looks like the recent uvm_meter() commit is what
> did this (to my machine at least).
> 
> The git commit for that is 71d823ace2523fb9fee2d1ab9b4d92a18d3f5714.
> 
> I compiled the commit right before it in the logs and booted no problems
> with a GENERIC.MP kernel config, but that one broke it.
> 
> I'm not as familiar with CVS, so apologies for not having the commit
> from there.
> 
> Here is the commit message if that helps, though I those on tech will
> know it regardless:
> 
> schedcpu, uvm_meter(9): make uvm_meter() an independent timeout
> 
> uvm_meter(9) should not base its periodic uvm_loadav() call on the UTC
> clock.  It also no longer needs to periodically wake up proc0 because
> proc0 doesn't do any work.  schedcpu() itself may change or go away,
> but as kettenis@ notes we probably can't completely remove the concept
> of a "load average" from OpenBSD, given its long Unix heritage.
> 
> So, (1) remove the uvm_meter() call from schedcpu(), (2) make
> uvm_meter() an independent timeout started alongside schedcpu() during
> scheduler_start(), and (3) delete the vestigial periodic proc0 wakeup.
> 
> With input from deraadt@, kettenis@, and claudio@.  deraadt@ cautions
> that this change may confuse administrators who hold the load average
> in high regard.
> 
> Thread: https://marc.info/?l=openbsd-tech=168710929409153=2
> 
> general agreement with this direction from kettenis@
> ok claudio@
> 
> If I should repost on tech, let me know.
 
Just to be sure.  Did you verify this with self compiled kernels with and 
without that commit?

Please do not compare self compiled kernels with snapshot kernels since
snapshots may carry additional diffs.

-- 
:wq Claudio



Re: [Bug (?) ld]: ld interprets % weirdly

2023-06-11 Thread Claudio Jeker
On Sun, Jun 11, 2023 at 12:01:04AM -0600, Theo de Raadt wrote:
> I assume you are on an architecture where the linker is LLVM ld,
> otherwise known as ld-lld in OpenBSD (some older architectures
> still use ld-bfd).
> 
> In llvm/lib/Support/Path.cpp, there is code that acts just like you describe:
> 
> void createUniquePath(const Twine , SmallVectorImpl ,
>   bool MakeAbsolute) {
> ...
>   // Replace '%' with random chars.
>   for (unsigned i = 0, e = ModelStorage.size(); i != e; ++i) {
> if (ModelStorage[i] == '%')
>   ResultPath[i] = "0123456789abcdef"[sys::Process::GetRandomNumber() & 
> 15];
>   }
> 
> 
> It apppears in the LLVM universe if you try to create a file with % in the
> name, it has a different interpretation of what that % means, different than
> what you want it to mean.
> 
> https://docs.hdoc.io/hdoc/llvm-project/f1FB0DB2307A8013C.html
> 
> Other than that, I can find no documentation.
 
What a stupid interface, lets rebuild mktemp(2) and not learn from
history. It is not like this is new unless you think 30years is new...

Humanity is surely doomed
-- 
:wq Claudio



Re: Multi path routing with BGPD

2023-06-01 Thread Claudio Jeker
On Thu, Jun 01, 2023 at 04:58:54PM +, Valdrin MUJA wrote:
> Hi Claudio,
> 
> Thanks for your reply. I think this is the saddest news lately.
> At this point, I have a question:
> This should not be a kernel issue, right?
> So, can I use an alternative like bird until this feature is developed?

I'm not sure if bird does multipath on OpenBSD. Guess you will find out.

> ____
> From: Claudio Jeker 
> Sent: Thursday, June 1, 2023 19:34
> To: Valdrin MUJA 
> Cc: MISC@openbsd.org 
> Subject: Re: Multi path routing with BGPD
> 
> On Mon, May 29, 2023 at 07:29:14PM +, Valdrin MUJA wrote:
> > Hello,
> >
> > I try to setup multipath routing environment with OpenBSD's bgpd.
> 
> multipath != add-path. OpenBGPD currently does not do multipath routing.
> It only uses the best path for the FIB and the nexthops are only resolved
> to one gateway.
> 
> > As I understand from man page the keyword is add-path.
> > Here is my environmental report:
> >
> >   1.  In my lab I simulate two wan links for each device.
> >   2.  Each device also has a LAN network to announce.
> >   3.  In the middle of these two devices there is another OpenBSD acting as 
> > Router.
> >
> > Device 1 :
> > WAN1 : 192.168.10.2/24
> > WAN2: 10.1.1.2/24
> > LAN : 172.16.1.1/24
> > GRE1 : 172.31.1.1 -> 172.31.1.2 netmask /24 (over wan1)
> > GRE2 : 172.31.2.1 -> 172.31.2.2 netmask /24 (over wan2)
> >
> > Device 2 :
> > WAN1 : 192.168.20.2/24
> > WAN2: 10.1.2.2/24
> > LAN : 172.16.2.1/24
> > GRE1 : 172.31.1.2 -> 172.31.1.1 netmask /24 (over wan1)
> > GRE2 : 172.31.2.2 -> 172.31.2.1 netmask /24 (over wan2)
> >
> >
> > Router :
> > 192.168.10.1/24
> > 192.168.20.1/24
> > 10.1.1.1/24
> > 10.1.2.1/24
> >
> > -
> >
> > Here bgpd.conf file contents :
> >
> > Device1# cat /etc/bgpd.conf
> > AS 100
> > network 172.16.1.0/24
> > neighbor 172.31.1.2 {
> >   remote-as 100
> >   log updates
> >   announce IPv4 unicast
> > announce add-path recv yes
> > announce add-path send best
> > }
> > neighbor 172.31.2.2 {
> >   remote-as 100
> >   log updates
> >   announce IPv4 unicast
> >   announce add-path recv yes
> >   announce add-path send best
> > }
> > allow quick from { ibgp }
> > allow quick to { ibgp }
> >
> > Device2# cat /etc/bgpd.conf
> > AS 100
> > network 172.16.2.0/24
> > neighbor 172.31.1.1 {
> >   remote-as 100
> >   log updates
> >   announce IPv4 unicast
> > announce add-path recv yes
> > announce add-path send best
> > }
> > neighbor 172.31.2.1 {
> >   remote-as 100
> >   log updates
> >   announce IPv4 unicast
> > announce add-path recv yes
> > announce add-path send best
> > }
> > allow quick from { ibgp }
> > allow quick to { ibgp }
> >
> > Here bgpctl show outputs:
> >
> > #bgp connection is OK
> >
> > Device1# bgpctl show
> > Neighbor   ASMsgRcvdMsgSent  OutQ Up/Down  
> > State/PrfRcvd
> > 172.31.1.2100  9  9 0 00:02:34  1
> > 172.31.2.2100  9  9 0 00:02:34  1
> >
> > # we can see rib tables are ready
> >
> > Device1# bgpctl show rib
> > flags: * = Valid, > = Selected, I = via IBGP, A = Announced,
> >S = Stale, E = Error
> > origin validation state: N = not-found, V = valid, ! = invalid
> > origin: i = IGP, e = EGP, ? = Incomplete
> >
> > flags ovs destination  gateway  lpref   med aspath origin
> > AI*>N 172.16.1.0/240.0.0.0   100 0 i
> > I*> N 172.16.2.0/24172.31.1.2100 0 i
> > I*m N 172.16.2.0/24172.31.2.2100 0 i
> >
> > Device2# bgpctl show rib
> > flags: * = Valid, > = Selected, I = via IBGP, A = Announced,
> >S = Stale, E = Error
> > origin validation state: N = not-found, V = valid, ! = invalid
> > origin: i = IGP, e = EGP, ? = Incomplete
> >
> > flags ovs destination  gateway  lpref   med aspath origin
> > I*> N 172.16.1.0/24172.31.1.1100 0 i
> > I*m N 172.16.1.0/24172.31.2.1100 0 i
> > AI*>N 172.16.2.0/240.0.0.0   100 0 i
> >
> >
> > But there is only

Re: Multi path routing with BGPD

2023-06-01 Thread Claudio Jeker
On Mon, May 29, 2023 at 07:29:14PM +, Valdrin MUJA wrote:
> Hello,
> 
> I try to setup multipath routing environment with OpenBSD's bgpd.

multipath != add-path. OpenBGPD currently does not do multipath routing.
It only uses the best path for the FIB and the nexthops are only resolved
to one gateway.

> As I understand from man page the keyword is add-path.
> Here is my environmental report:
> 
>   1.  In my lab I simulate two wan links for each device.
>   2.  Each device also has a LAN network to announce.
>   3.  In the middle of these two devices there is another OpenBSD acting as 
> Router.
> 
> Device 1 :
> WAN1 : 192.168.10.2/24
> WAN2: 10.1.1.2/24
> LAN : 172.16.1.1/24
> GRE1 : 172.31.1.1 -> 172.31.1.2 netmask /24 (over wan1)
> GRE2 : 172.31.2.1 -> 172.31.2.2 netmask /24 (over wan2)
> 
> Device 2 :
> WAN1 : 192.168.20.2/24
> WAN2: 10.1.2.2/24
> LAN : 172.16.2.1/24
> GRE1 : 172.31.1.2 -> 172.31.1.1 netmask /24 (over wan1)
> GRE2 : 172.31.2.2 -> 172.31.2.1 netmask /24 (over wan2)
> 
> 
> Router :
> 192.168.10.1/24
> 192.168.20.1/24
> 10.1.1.1/24
> 10.1.2.1/24
> 
> -
> 
> Here bgpd.conf file contents :
> 
> Device1# cat /etc/bgpd.conf
> AS 100
> network 172.16.1.0/24
> neighbor 172.31.1.2 {
>   remote-as 100
>   log updates
>   announce IPv4 unicast
> announce add-path recv yes
> announce add-path send best
> }
> neighbor 172.31.2.2 {
>   remote-as 100
>   log updates
>   announce IPv4 unicast
>   announce add-path recv yes
>   announce add-path send best
> }
> allow quick from { ibgp }
> allow quick to { ibgp }
> 
> Device2# cat /etc/bgpd.conf
> AS 100
> network 172.16.2.0/24
> neighbor 172.31.1.1 {
>   remote-as 100
>   log updates
>   announce IPv4 unicast
> announce add-path recv yes
> announce add-path send best
> }
> neighbor 172.31.2.1 {
>   remote-as 100
>   log updates
>   announce IPv4 unicast
> announce add-path recv yes
> announce add-path send best
> }
> allow quick from { ibgp }
> allow quick to { ibgp }
> 
> Here bgpctl show outputs:
> 
> #bgp connection is OK
> 
> Device1# bgpctl show
> Neighbor   ASMsgRcvdMsgSent  OutQ Up/Down  
> State/PrfRcvd
> 172.31.1.2100  9  9 0 00:02:34  1
> 172.31.2.2100  9  9 0 00:02:34  1
> 
> # we can see rib tables are ready
> 
> Device1# bgpctl show rib
> flags: * = Valid, > = Selected, I = via IBGP, A = Announced,
>S = Stale, E = Error
> origin validation state: N = not-found, V = valid, ! = invalid
> origin: i = IGP, e = EGP, ? = Incomplete
> 
> flags ovs destination  gateway  lpref   med aspath origin
> AI*>N 172.16.1.0/240.0.0.0   100 0 i
> I*> N 172.16.2.0/24172.31.1.2100 0 i
> I*m N 172.16.2.0/24172.31.2.2100 0 i
> 
> Device2# bgpctl show rib
> flags: * = Valid, > = Selected, I = via IBGP, A = Announced,
>S = Stale, E = Error
> origin validation state: N = not-found, V = valid, ! = invalid
> origin: i = IGP, e = EGP, ? = Incomplete
> 
> flags ovs destination  gateway  lpref   med aspath origin
> I*> N 172.16.1.0/24172.31.1.1100 0 i
> I*m N 172.16.1.0/24172.31.2.1100 0 i
> AI*>N 172.16.2.0/240.0.0.0   100 0 i
> 
> 
> But there is only one path in FIB table:
> 
> Device1# bgpctl show fib | grep B
> flags: B = BGP, C = Connected, S = Static
>N = BGP Nexthop reachable via this route
> B   48 172.16.2.0/24172.31.1.2
> 
> Device2# bgpctl show fib | grep B
> flags: B = BGP, C = Connected, S = Static
>N = BGP Nexthop reachable via this route
> B   48 172.16.1.0/24172.31.1.1
> 
> Also my sysctl.conf is ok (net.inet.ip.multipath=1)
> I just wanna add multpath routes for my networks as dynamic.
> 
> It's ok with static routing(*) but I would like to achieve it as dynamically 
> with bgpd.
> What is wrong with my configuration? Can you please help me.
> Thanks.
> 
> (*)
> Device1# route add 172.16.2.0/24 172.31.1.2 -mpath
> add net 172.16.2.0/24: gateway 172.31.1.2
> Device1# route add 172.16.2.0/24 172.31.2.2 -mpath
> add net 172.16.2.0/24: gateway 172.31.2.2
> Device1# netstat -rnf inet | grep 172.16.2
> 172.16.2/24172.31.1.2 UGSP   00 - 8 gre1
> 172.16.2/24172.31.2.2 UGSP   00 - 8 gre2
> 
> Device2# route add 172.16.1.0/24 172.31.1.1 -mpath
> add net 172.16.1.0/24: gateway 172.31.1.1
> Device2# route add 172.16.1.0/24 172.31.2.1 -mpath
> add net 172.16.1.0/24: gateway 172.31.2.1
> Device2# netstat -rnf inet | grep 172.16.1
> 172.16.1/24172.31.1.1 UGSP   00 - 8 gre1
> 172.16.1/24172.31.2.1 UGSP   00 - 8 gre2
> 

You don't need add-path for your setup 

Re: Route based IPsec

2023-05-31 Thread Claudio Jeker
On Wed, May 31, 2023 at 06:39:27PM +1000, David Gwynne wrote:
> 
> 
> > On 31 May 2023, at 18:33, Claudio Jeker  wrote:
> > 
> > On Wed, May 31, 2023 at 08:35:45AM +1000, David Gwynne wrote:
> >> 
> >> 
> >>> On 27 May 2023, at 21:40, Stuart Henderson  
> >>> wrote:
> >>> 
> >>> On 2023-05-27, Valdrin MUJA  wrote:
> >>>>   Does OpenBSD have routed based IPsec support?
> >>> 
> >>> Not yet.
> >> 
> >> while you wait, it might be possible to configure a gif tunnel protected
> >> by ipsec transport mode.
> >> 
> > 
> > The annoying bit with gif tunnels in transport mode is the need for static
> > IPs on both sides of the tunnel. I ended up tunneling gif in tunnel mode
> > because of that.
> 
> that's an annoying thing about gif, even without ipsec in the mix.

Indeed. Both gif and gre share this issue.
 
> should i make it possible to specify an interface as the source of local
> addresses on tunnels?
 
Not sure if it is worth the effort since the other end of the tunnel needs
to adjust the tunnel remote address as well. Neither gif nor gre support
authentication. Using wg(4) for that is an option but because of dynamic
routing I ended up packing a gif tunnel into wg(4) (so I'm back to square
one).

-- 
:wq Claudio



Re: Route based IPsec

2023-05-31 Thread Claudio Jeker
On Wed, May 31, 2023 at 08:35:45AM +1000, David Gwynne wrote:
> 
> 
> > On 27 May 2023, at 21:40, Stuart Henderson  
> > wrote:
> > 
> > On 2023-05-27, Valdrin MUJA  wrote:
> >>Does OpenBSD have routed based IPsec support?
> > 
> > Not yet.
> 
> while you wait, it might be possible to configure a gif tunnel protected
> by ipsec transport mode.
> 

The annoying bit with gif tunnels in transport mode is the need for static
IPs on both sides of the tunnel. I ended up tunneling gif in tunnel mode
because of that.

-- 
:wq Claudio



Re: small issue with mpe

2023-05-23 Thread Claudio Jeker
On Wed, May 24, 2023 at 01:31:56PM +1000, David Gwynne wrote:
> 
> 
> > On 23 May 2023, at 17:40, Claudio Jeker  wrote:
> > 
> > On Tue, May 23, 2023 at 07:09:51AM -, Stuart Henderson wrote:
> >> On 2023-05-23, David Gwynne  wrote:
> >>> On Sat, May 20, 2023 at 09:44:51AM +0200, Holger Glaess wrote:
> >>>> hi
> >>>> 
> >>>> 
> >>>> looks like that the patch works , but should not print "tunneldomain"
> >>>> instead of "rdomain" ?
> >>> 
> >>> that's an interesting question.
> >>> 
> >>> ifconfig does not aim to produce output that can then be used as input
> >>> for ifconfig again. printing it as rdomain is at least consistent with
> >>> how it's printed on the tunnel: line for things like etherip and gif,
> >>> and i guess the assumption is you can figure out that it's tunneldomain
> >>> from the context.
> >> 
> >> things are a bit inconsistent here - doesn't this actually take an rtable
> >> not an rdomain? (wg uses and prints "wgrtable" for what I think is the
> >> equivalent thing).
> > 
> > I think this is a general issue with tunneldomain. It should be
> > tunneltable since it used for two things. The route lookup of the tunnel
> > endpoints and to alter the mbufs rdomain on encapsulation / decapsulation. 
> > At least in theory this is how it should work but someone needs to verify
> > that all drivers really behave like this.
> 
> ifconfig drv0 rdomain specifies which rdomain and send packets into the
> interface, and which rdomain the packets coming out of the interface
> will use. this is the same on all interfaces whether they're tunnels or
> not.
> 
> ifconfig drv0 tunneldomain specifies the rdomain that the encapsulated
> packets operate in.
> 
> rdomain and tunneldomain (if supported) are always in effect and in the
> same way. packets sent from an rdomain out a tunnel will get the tunnel
> headers added to the packet and the rdomain rewritten to the
> tunneldomain value (which could be 0). encapsulated packets from the
> remote tunnel endpoint have to match the tunneldomain before the tunnel
> interface will match them and decapsulate them, and once they're
> decapsulated the rdomain on the packet is set to the interface rdomain
> value.

I agree with all you wrote. There is one part were the tunneldomain could
be considered an rtableid (and not a pure rdomainid).
When sending out encapsulated packets a route lookup is done using the
tunneldomain value. For this lookup it would make sense to allow an
rtableid. This only affects sending out traffic, on the receive side the
system needs to lookup the endpoint using the rdomain using rtable_l2()
(this is similar on how setrtable(2) and the corresponding getsockopt()
work on sockets).

Now there was never a feature request to send out gre/gif traffic using an
alternate routing table and so I think we can keep this the way it is.

-- 
:wq Claudio



Re: small issue with mpe

2023-05-23 Thread Claudio Jeker
On Tue, May 23, 2023 at 07:09:51AM -, Stuart Henderson wrote:
> On 2023-05-23, David Gwynne  wrote:
> > On Sat, May 20, 2023 at 09:44:51AM +0200, Holger Glaess wrote:
> >> hi
> >> 
> >> 
> >> looks like that the patch works , but should not print "tunneldomain"
> >> instead of "rdomain" ?
> >
> > that's an interesting question.
> >
> > ifconfig does not aim to produce output that can then be used as input
> > for ifconfig again. printing it as rdomain is at least consistent with
> > how it's printed on the tunnel: line for things like etherip and gif,
> > and i guess the assumption is you can figure out that it's tunneldomain
> > from the context.
> 
> things are a bit inconsistent here - doesn't this actually take an rtable
> not an rdomain? (wg uses and prints "wgrtable" for what I think is the
> equivalent thing).

I think this is a general issue with tunneldomain. It should be
tunneltable since it used for two things. The route lookup of the tunnel
endpoints and to alter the mbufs rdomain on encapsulation / decapsulation. 
At least in theory this is how it should work but someone needs to verify
that all drivers really behave like this.

-- 
:wq Claudio



Re: 'bgpctl show rib in neighbor $peer' no longer shows unfiltered received routes

2023-05-09 Thread Claudio Jeker
On Tue, May 09, 2023 at 09:49:18AM +0200, Rogier Krieger wrote:
> Thanks for the rapid response and proposal.
> I'd wanted to test yesterday but had to postpone.
> 
> On Mon, May 8, 2023 at 12:18 PM Claudio Jeker  
> wrote:
> > Here is a possible solution where a perfect match aborts the detection
> > loop. Now this only works if the labels are in the right order ("in"
> > before "invalid").
> 
> This is similar to what I had in mind, but shorter than what I'd thought of.
> I'll test on -current first and report back. After, I'll adapt for
> -release after (i.e. the equivalent of r1.124 for parser.c [1]).
> 
> 
> > I wonder if chaning "invalid" to "notvalid" or "noteligible" would be a
> > better fix for now...
> 
> Personally, I like the flexibility of keyword freedom, given the small
> one-time price to pay of sorting.
> Sorting may make maintenance a little easier too; at least I've seen
> several recent commits elsewhere to that end.

Right now I favour to rename the keyword since it is simpler. The idea is
to use "disqualified" as keyword. This has some additional benefits since
invalid is rather overloaded (ovs, avs use invalid and then there is error
which is a different kind of invalid).
The routes 'bgpctl show rib invalid' displays are Loc-RIB entries which
can not be selected in the decision process because of various reasons.

-- 
:wq Claudio



Re: 'bgpctl show rib in neighbor $peer' no longer shows unfiltered received routes

2023-05-08 Thread Claudio Jeker
On Mon, May 08, 2023 at 09:14:43AM +0200, Claudio Jeker wrote:
> On Mon, May 08, 2023 at 12:28:58AM +0200, Rogier Krieger wrote:
> > While diagnosing an unrelated matter, I find that 'bgpctl show rib'
> > has difficulty with the 'in' keyword. The 'out' counterpart works as
> > expected. Looking at bgpctl(8), the following should work (but
> > doesn't):
> > $ bgpctl show rib in neighbor $peer
> > ambiguous argument: in
> > valid commands/args:
> > 
> >   invalid
> >   leaked
> >   in
> >   out
> > 
> > 
> > Note: tested this on a 7.3 (w/ bgpd erratum) release system.
> > On a 7.2 release system, I don't see this regression (unsurprising, as
> > bgpctl(8) there doesn't list  'invalid' as a valid 'show rib' option).
> > 
> > I suspect this involves the logic in match_token() from
> > src/usr.sbin/bgpctl/parser.c. I'll take a stab at providing a patch.
> > Meanwhile, I'd appreciate any hints and/or a workaround for the mean
> > time.
> 
> Ugh. This broke with the introduction of "invalid" as keyword.
> The parser is not smart enough to handle keywords with equal prefix.
> 
> Using a different word that does not start with "in" would be the simplest
> fix.
 
Here is a possible solution where a perfect match aborts the detection
loop. Now this only works if the labels are in the right order ("in"
before "invalid").

Since on match the internal state is modified the check needs to include
match == 1 to ensure that the state is correct.

I wonder if chaning "invalid" to "notvalid" or "noteligible" would be a
better fix for now...
-- 
:wq Claudio

Index: parser.c
===
RCS file: /cvs/src/usr.sbin/bgpctl/parser.c,v
retrieving revision 1.132
diff -u -p -r1.132 parser.c
--- parser.c21 Apr 2023 10:49:01 -  1.132
+++ parser.c8 May 2023 10:12:44 -
@@ -525,12 +525,14 @@ match_token(int argc, char *argv[], cons
struct filter_set   *fs;
const char  *word = argv[0];
size_t   wordlen = 0;
+   int  recheck;
 
*argsused = 1;
match = 0;
if (word != NULL)
wordlen = strlen(word);
for (i = 0; table[i].type != ENDTOKEN; i++) {
+   recheck = 0;
switch (table[i].type) {
case NOTOKEN:
if (word == NULL || wordlen == 0) {
@@ -548,6 +550,7 @@ match_token(int argc, char *argv[], cons
case KEYWORD:
if (word != NULL && strncmp(word, table[i].keyword,
wordlen) == 0) {
+   recheck = 1;
match++;
t = [i];
if (t->value)
@@ -557,6 +560,7 @@ match_token(int argc, char *argv[], cons
case FLAG:
if (word != NULL && strncmp(word, table[i].keyword,
wordlen) == 0) {
+   recheck = 1;
match++;
t = [i];
res.flags |= t->value;
@@ -630,6 +634,7 @@ match_token(int argc, char *argv[], cons
case ASTYPE:
if (word != NULL && strncmp(word, table[i].keyword,
wordlen) == 0) {
+   recheck = 1;
match++;
t = [i];
res.as.type = t->value;
@@ -687,6 +692,7 @@ match_token(int argc, char *argv[], cons
fs->action.community = res.community;
TAILQ_INSERT_TAIL(, fs, entry);
 
+   recheck = 1;
match++;
t = [i];
}
@@ -703,6 +709,7 @@ match_token(int argc, char *argv[], cons
fs->action.community = res.community;
TAILQ_INSERT_TAIL(, fs, entry);
 
+   recheck = 1;
match++;
t = [i];
}
@@ -720,6 +727,7 @@ match_token(int argc, char *argv[], cons
fs->action.community = res.community;
TAILQ_INSERT_TAIL(, fs, entry);
 
+   recheck = 1;
match++;
t = [i];
}
@@ -771,6 +779,7 @@ match_token(int argc, char *argv[], cons
wordle

Re: 'bgpctl show rib in neighbor $peer' no longer shows unfiltered received routes

2023-05-08 Thread Claudio Jeker
On Mon, May 08, 2023 at 12:28:58AM +0200, Rogier Krieger wrote:
> While diagnosing an unrelated matter, I find that 'bgpctl show rib'
> has difficulty with the 'in' keyword. The 'out' counterpart works as
> expected. Looking at bgpctl(8), the following should work (but
> doesn't):
> $ bgpctl show rib in neighbor $peer
> ambiguous argument: in
> valid commands/args:
> 
>   invalid
>   leaked
>   in
>   out
> 
> 
> Note: tested this on a 7.3 (w/ bgpd erratum) release system.
> On a 7.2 release system, I don't see this regression (unsurprising, as
> bgpctl(8) there doesn't list  'invalid' as a valid 'show rib' option).
> 
> I suspect this involves the logic in match_token() from
> src/usr.sbin/bgpctl/parser.c. I'll take a stab at providing a patch.
> Meanwhile, I'd appreciate any hints and/or a workaround for the mean
> time.

Ugh. This broke with the introduction of "invalid" as keyword.
The parser is not smart enough to handle keywords with equal prefix.

Using a different word that does not start with "in" would be the simplest
fix.

-- 
:wq Claudio



Re: OpenBSD and AMD EPYC/RYZEN 10gb

2023-04-12 Thread Claudio Jeker
On Wed, Apr 12, 2023 at 02:05:02PM +, Laura Smith wrote:
> No worries.
> 
> (And for anyone following on-list, I think FreeBSD might have
> subsequently renamed axgbe to something else beginning on ax, I think
> maybe "axa" as per the "history" note on the bottom of this page
> https://www.gsp.com/cgi-bin/man.cgi?section=4=AXP)
> 
 
These integrated network ports are often disabled. I have not found a
reasonably priced system that has them exposed. This is an important reason
why OpenBSD is laking this driver. A developer needs to get such a system.
None of the supermicro boards with AMD EPYC SoC have those 10G ports
exposed.

-- 
:wq Claudio
 
> Sent with Proton Mail secure email.
> 
> --- Original Message ---
> On Wednesday, April 12th, 2023 at 15:00, Mischa  wrote:
> 
> 
> > Hi Laura,
> > 
> > Gotcha... I don't have those laying around. :)
> > 
> > Mischa
> > 
> > On 2023-04-12 15:54, Laura Smith wrote:
> > 
> > > Hi Mischa
> > > 
> > > Thank you for that.
> > > 
> > > However I think perhaps I was a little unclear in my original post, and
> > > for that I apologise.
> > > 
> > > To be (more) clear, I was not talking about "network cards in AMD
> > > computers", I was talking about AMD SoC network ports.
> > > 
> > > e.g. in FreeBSD land, these are known as "axgbe"
> > > (https://reviews.freebsd.org/D25793,
> > > https://doc.dpdk.org/guides/nics/axgbe.html)
> > > 
> > > --- Original Message ---
> > > On Wednesday, April 12th, 2023 at 12:45, Mischa open...@mlst.nl
> > > wrote:
> > > 
> > > > Hi Laura,
> > > > 
> > > > Just received my replacement card (10G RJ45, in stead of SFP+) for the
> > > > R6415 EPYC machine.
> > > > It works without any problems in 7.3.
> > > > 
> > > > bnxt0: flags=808843
> > > > 
> > > > mtu 1500
> > > > lladdr 2c:ea:7f:ad:ff:3e
> > > > index 3 priority 0 llprio 3
> > > > groups: egress
> > > > media: Ethernet autoselect (10GbaseT full-duplex,rxpause,txpause)
> > > > status: active
> > > > inet 192.168.1.107 netmask 0xff00 broadcast 192.168.1.255
> > > > 
> > > > The dmesg you can find at:
> > > > https://dmesgd.nycbug.org/index.cgi?do=view=7047
> > > > The chipset for the SFP+ card is the same.
> > > > 
> > > > Mischa
> > > > 
> > > > On 2023-04-12 12:01, Laura Smith wrote:
> > > > 
> > > > > Has anyone had the opportunity to experiment using OpenBSD in
> > > > > conjunction with AMD EPYC/RYZEN native 10gb ports ?
> > > > > 
> > > > > As far as I can see there are no drivers for it in stable ? But maybe
> > > > > someone's been playing with it on the bleeding-edge ?
> > > > > 
> > > > > Thanks !
> > > > > 
> > > > > Laura
> 



Re: rdomains finally working!!

2023-04-03 Thread Claudio Jeker
On Mon, Apr 03, 2023 at 10:53:26AM +0100, Kaya Saman wrote:
> Hey guys,
> 

...

> Taking an excerpt from the website I was following:
> 
> https://www.packetmischief.ca/2011/09/20/virtualizing-the-openbsd-routing-table/
> 
> Citing:
> 
> Creating a loopback interface in rdomain 2 so that Host 1 can talk to Host 2
> would look like:
> 
> ifconfig lo2 rdomain 2 127.0.0.1
> route -T 2 add 192.168.1/24 127.0.0.1
> Since lo2 is created inside rdomain 2, the IP address assigned to it doesn't
> conflict with lo0 in rdomain 0.
> 
> 
> Sure I can see traffic inside one of the loopbacks and tcpdump does claim
> "pass out" but then nothing else happens. The other loopback interfaces have
> no traffic in them and the destination network has no traffic either.

This is very much expected since you probably did not carefully read the
cited website.

You need a special pf.conf setup to make that work. As one caveat
mentioned in the article is that the default pf.conf rulesets skips lo(4)
interfaces and so the traffic will just be dropped (since there is no
state lookup and so no way to bounce the reverse traffic back into the
other rdomain).

In general I would suggest use pair(4) to route traffic between rdomains.
Doing it in pf(4) gives you more control but it requires careful handling
of the rulesets (as you noticed).

-- 
:wq Claudio



Re: Folks are there any tips to improve page load times on smokeping running on OpenBSD

2023-03-07 Thread Claudio Jeker
On Tue, Mar 07, 2023 at 08:36:24AM +, Stuart Henderson wrote:
> On 2023/03/07 07:10, Tom Smyth wrote:
> > I m running smokeping fcgi and rrdcached ontop of OpenbSD, to smokeping
> > about 150 devces
> > the page load times can take 30 seconds to 1 minute,
> > is there any way to speed this up.
> > 
> > im running 7.2 OpenBSD on amd64 vm on top of an SSD array
> > 
> > any tips tricks welccome ...
> 
> One quick thing to try is updating to -current, I made some changes to
> the rrdtool port which may possibly help a little.
> 
> Check that smokeping is actually using rrdcached (watch top while
> opening a page) - the pkg-readme only gives instructions for passing the
> required fastcgi variable through for nginx, I don't know how to do that
> for httpd (or whether it's actually possible).
> 
> Other than that, rrdtool/rrdcached is just slow on OpenBSD. If it's
> anything like mine you'll see high cpu spin % in top while it's busy.
> You can try changing the number of cores in the VM - if you've given it
> lots of cores try *reducing* it a bit. To pick a number out of the air
> I'd suggest probably 4-6. (mine is bare metal and I can't drop the
> number short of kernel hacks to set more cores offline).
> 
> You can hide the slowness at the loss of dynamic functionality in the
> web interface by pre rendering the html/graphs from a cron job rather
> than using the fastcgi (see the pkg-readme). But other than the above
> I'm out of ideas to actually make it run faster.
> 
> (If anyone interested in poking at kernel locks would like flamegraphs
> from my monitoring box - librenms/smokeping/icinga2/mariadb with lots
> of rrdtool/rrdcached - let me know. Spin is pretty brutal.)

No need to collect flamegraphs, the issue is massive contention on the
kernel lock because of high IO load. I see similar behaviour with iogen.
Currently competing read and write calls clash with the async buffer
handling which also requires the kernel lock to finish their work. So more
concurrency makes it worse. Fixing this is a major task.

-- 
:wq Claudio



Re: Recommended place to store static arp entries

2023-02-28 Thread Claudio Jeker
On Tue, Feb 28, 2023 at 03:30:18PM +0200, Cristian Danila wrote:
> Dear Misc,
> 
> I would really appreciate if more experienced members of you
> could suggest if there is a dedicated place or recommended
> place for OpenBSD where static arp entries should be stored.
> I found many answers over the internet, in some books it is
> mentioning /etc/netstart.
> Also on very old thread fro OpenBSD I see it was discussed at
> some point a possible idea like /etc/arp.conf
> https://marc.info/?l=openbsd-bugs=103773290509612=2
> In the same thread it was mentioned rc.conf but definitly rc.conf
> is a file that states that is should not be edited.
> Or maybe rc.conf.local as an alternative?
> 
> Where do you recommend storing static arp entries?

To be honest I never had the need to store static arp entries. So for me
the best place is /dev/null. Now if I really had to choose I would select
the interface's hostname.if file to add static entries. It is the place
where the interface gets its network which is the place arp entries hang
off of. It will all be configured together and immediatly usable.

-- 
:wq Claudio



Re: Performance optimizing OpenBSD 7.2

2023-02-15 Thread Claudio Jeker
On Wed, Feb 15, 2023 at 01:01:10PM -, Stuart Henderson wrote:
> On 2023-02-15, Lars Bonnesen  wrote:
> > One says:
> >
> > # pfctl -s info
> > Status: Enabled for 0 days 10:56:43  Debug: err
> >
> > State Table  Total Rate
> >   current entries91680
> 
> Lots of entries, close to the default:
> 
> $ doas pfctl -sm 
> stateshard limit   10
> src-nodes hard limit1
> frags hard limit65536
> tableshard limit 1000
> table-entries hard limit   20
> pktdelay-pkts hard limit1
> anchors   hard limit  512
> 
> >   half-open tcp   4032
> >   searches  313230429479494.1/s
> >   inserts 60916552 1546.0/s
> >   removals60824872 1543.7/s
> > Counters
> >   match   79164265 2009.1/s
> >   bad-offset 00.0/s
> >   fragment   10.0/s
> >   short  00.0/s
> >   normalize  00.0/s
> >   memory   1768012   44.9/s
> 
> And this most likely means that you've been bumping into the
> state limit plenty of times already.
> 
> >   bad-timestamp  00.0/s
> >   congestion  12010.0/s
> >   ip-option  00.0/s
> >   proto-cksum  3870.0/s
> >   state-mismatch  82794949 2101.2/s
> 
> Loads of state mismatches and, looking at the rate, this is
> probably on an ongoing basis.
> 
> Check to make sure that all packets match either a "pass" or "block"
> rule (the easiest way to do this is usually to have a simple "block"
> or "block log" as the first rule) - if you don't have any matching
> rule in the config, there is an implicit default which passes traffic
> *without* creating state.
> 
> (One particularly common result of this is that TCP window scaling
> isn't handled properly such that longer lived or fast TCP connections
> are likely to slow down or stall.)
> 
> You might also need to bump the state limit, but I'd check the above
> first because the high number of states might be caused because of
> mismatches.

I think the state-mismatch is a result of hitting the state limit and not
the other way around.  At over 90'000 states the default timeouts are
reduced by more than 50% and so states are removed too soon resulting in a
state-mismatch.

So first bump the limit up and then look at the counters again.

-- 
:wq Claudio



Re: Performance optimizing OpenBSD 7.2

2023-02-15 Thread Claudio Jeker
On Wed, Feb 15, 2023 at 01:39:54PM +0100, Lars Bonnesen wrote:
> One says:
> 
> # pfctl -s info
> Status: Enabled for 0 days 10:56:43  Debug: err
> 
> State Table  Total Rate
>   current entries91680
>   half-open tcp   4032
>   searches  313230429479494.1/s
>   inserts 60916552 1546.0/s
>   removals60824872 1543.7/s
> Counters
>   match   79164265 2009.1/s
>   bad-offset 00.0/s
>   fragment   10.0/s
>   short  00.0/s
>   normalize  00.0/s
>   memory   1768012   44.9/s
>   bad-timestamp  00.0/s
>   congestion  12010.0/s
>   ip-option  00.0/s
>   proto-cksum  3870.0/s
>   state-mismatch  82794949 2101.2/s
>   state-insert 2300.0/s
>   state-limit00.0/s
>   src-limit  00.0/s
>   synproxy   00.0/s
>   translate  00.0/s
>   no-route   00.0/s
> 
> The other says:
> 
> # pfctl -s info
> Status: Enabled for 0 days 10:39:38  Debug: err
> 
> State Table  Total Rate
>   current entries93847
>   half-open tcp   8441
>   searches  3900545422   101634.9/s
>   inserts 69463584 1810.0/s
>   removals69369737 1807.5/s
> Counters
>   match  75220369719599.9/s
>   bad-offset 00.0/s
>   fragment   00.0/s
>   short  00.0/s
>   normalize  20.0/s
>   memory2124545.5/s
>   bad-timestamp  00.0/s
>   congestion 00.0/s
>   ip-option  00.0/s
>   proto-cksum00.0/s
>   state-mismatch  33380332  869.8/s
>   state-insert   00.0/s
>   state-limit00.0/s
>   src-limit  00.0/s
>   synproxy   00.0/s
>   translate  00.0/s
>   no-route   00.0/s
> 
> What does that tell us?

That you need to increase the state limit in pf.
pfctl -s info should not report any memory error.
pfctl -sm will show you the current limits. pf.conf(5) has info on how to
increase the limit (set limit).

-- 
:wq Claudio



Re: Performance optimizing OpenBSD 7.2

2023-02-15 Thread Claudio Jeker
On Wed, Feb 15, 2023 at 10:28:57AM +0100, Gábor LENCSE wrote:
> Hi Lars,
> 
> > I downscaled from 8 to 4 vCPUs and from 8 to 4 gig RAM - and the two obsd
> > now seems to hold the packages decently.
> 
> As for performance optimization, I think the direction is good, and perhaps
> you could go even further if you have a load balancing device that can
> distribute the traffic among the multiple VMs.

Not sure why reducing the memory should help. Also reducing the number of
virtual CPUs has probably little effect as well. The main point in reducing
the number of cores and disabling threads is to give modern CPUs more
thermal/power headroom to run the fewer CPUs at a higher clockspeed.
I doubt you get the same effect on vCPUs.
 
> In OpenBSD, the packet forwarding happens single threaded, so the
> performance of your system does not benefit much from the 4 cores.

That is not quite correct anymore.

-- 
:wq Claudio



Re: OpenBSD as a transparent switch filter

2023-01-24 Thread Claudio Jeker
On Tue, Jan 24, 2023 at 11:43:08AM +, Tom Smyth wrote:
> Hello Cristian,
> if you want to filter on layer 2 ... you would need to use Bridge
> have a look at  man ifconfig(8)
> bridge filter rules can be added to ports in the bridge...
> you can also tag traffic in bridge filter rules and then use PF to
> filter them...
> 
> but if your objective is to isolate ports from each other.. this can
> be achieved with protected port groups...
> again check out ifconfig (8)
> TLDR version bridge ports in the same protected port group are
> isolated from each other...
> If port isolation if all your looking for (no other detailed filtering
> ) if (im not sure) veb(4) supports protected ports...then this would
> be faster...
> but to my shame I have not tried out veb(4)
> 
> I hope this is of some use...
> 

The problem is not veb(4) vs bridge(4) (both should work and I would
suggest you try to stay away from brigde(4)). The problem is the hairpin
on the single interface to the switch. AFAIK neither veb(4) nor bridge(4)
will send back a packet on the same port it was received on. Doing so
can result in packet loops.

 
> On Tue, 24 Jan 2023 at 11:29, Cristian Danila  wrote:
> >
> > Hello
> >
> > I have a more difficult task that I would like to solve with OpenBSD
> > and I would really
> > appreciate any ideas if it is possible to achieve such.
> >
> > I have:
> > - one OpenBSD box with one Ethernet port
> > - one big switch with multiple devices connected
> >
> > All switch ports are isolated by each other with one exception:
> > - All ports can communicate with only one Ethernet port(let's say port 20)
> >
> > Now what i would like to achieve is to connect an Ethernet cable between
> > OpenBSD box and port 20 of the switch, and make OpenBSD a transparent
> > filtering hub.
> >
> > So I need OpenBSD box to be a transparent bridge and filter between
> > clients of the switch.
> >
> > Can anybody suggest a point where I can think about?
> > I was thinking initially to add the nic(em0) to veb0 then with link1
> > achieve L3 filtering but
> > definitely I think I miss something important.
> > I am open to research everything is needed for it but I miss a
> > starting point and I would
> > really appreciate any hint.
> >
> > Kind regards,
> > Claudiu
> >
> 
> 
> -- 
> Kindest regards,
> Tom Smyth.
> 

-- 
:wq Claudio



Re: OpenBGDP IPv6 ignoring set localpref parameter

2023-01-09 Thread Claudio Jeker
On Mon, Jan 09, 2023 at 11:59:22AM -0500, Matt wrote:
> Hello list,
> 
>  
> 
> I've run across an interesting issue which I think might be something I did
> wrong but here goes. Below is my configuration file for bgpd.conf. I will
> also give you the interface configurations for the two tunnels that I am
> running. When I show the RIB using bgpctl show rib, I notice that the set
> localpref parameter is not being applied properly to IPv6.
> 
>  
> 
> #/etc/hostname.wg0
> 
> wgkey 
> 
> wgpeer  wgendpoint 47.87.173.98 21764 wgaip
> 192.168.220.190/32 wgaip 172.20.53.98/32 wgaip 172.20.0.0/14 wgaip
> fe80::ade1 wgaip fe80::ade0 wgaip fd00::/8 wgpka 20
> 
> inet 192.168.220.190/32
> 
> inet6 fe80::ade1%wg0
> 
> descr "TO-KIOUBIT"
> 
> up
> 
> !route add -host 172.20.53.98 192.168.220.190
> 
> !route add -inet6 fe80::ade0 fe80::ade1%wg0
> 
> !route add -inet6 fd00::/8 fe80::ade1%wg0
> 
>  
> 
> #/etc/hostname.gre0
> 
> 172.21.83.84 172.21.83.85
> 
> tunnel 173.49.42.100 81.2.241.46
> 
> descr "TO-NOP.HU"
> 
> up
> 
> !ifconfig gre0 inet6 fd40:cc1e:c0de::252 fd40:cc1e:c0de::251
> 
>  
> 
> #/etc/bgpd.conf
> 
> ASN="4242421764"
> 
>  
> 
> AS $ASN
> 
> router-id 192.168.220.190
> 
>  
> 
> prefix-set mynetworks {
> 
> 172.20.165.192/27
> 
> fd0b:7449:62d2::/48
> 
> }
> 
>  
> 
> prefix-set nothankyou {
> 
> 10.0.0.0/8
> 
> }
> 
>  
> 
> network prefix-set mynetworks set large-community $ASN:1:1
> 
>  
> 
>  
> 
> group "kioubit" {
> 
> set localpref 20
> 
> neighbor 172.20.53.98 {
> 
> remote-as 4242423914
> 
> descr "TO-KIOUBIT-IPV4-US2"
> 
> }
> 
>  
> 
> neighbor fe80::ade0 {
> 
> remote-as 4242423914
> 
> descr "TO-KIOUBIT-IPV6-US2"
> 
> }
> 
> }
> 
>  
> 
> group "mc36" {
> 
>set localpref 10
> 
> neighbor 172.21.83.85 {
> 
> remote-as 4242421955
> 
> descr "TO-NOP.HU-IPV4"
> 
> }
> 
>  
> 
> neighbor fd40:cc1e:c0de::251 {
> 
>remote-as 4242421955
> 
> descr "TO-NOP.HU-IPV6"
> 
> set localpref 10
> 
> }
> 
> }
> 
>  
> 
> deny quick from ebgp prefix-set mynetworks or-longer
> 
> deny quick from ebgp prefix-set nothankyou or-longer
> 
> deny quick from any max-as-len 8
> 
>  
> 
> allow to ebgp prefix-set mynetworks large-community $ASN:1:1
> 
> allow from ebgp ovs valid
> 
>  
> 
> match from ebgp set { large-community delete $ASN:*:* }
> 
> match from any community GRACEFUL_SHUTDOWN set { localpref 0 }
> 
>  
> 
> include "/etc/roa-set.conf"
> 
>  
> 
> When I type bgpctl show rib, I see that the route selected for IPv6 traffic
> is going through the neighbor fd40:cc1e:c0de::251 and not fe80::ade0.
> Ideally, I'd rather have IPv6 go through the neighbor fe80::ade0 as that one
> is on my continent. Below is an example from the show rib statement. I don't
> even know why the fe80::ade0 address is not even showing up in the output.
> 
>  
> 
> *>  V fd00:bb:5bf3::/48fd40:cc1e:c0de::25110 0 4242421955
> 4242423088 4242420549 i
> 
> V fd00:bb:5bf3::/48:: 20 0 4242423914
> 4242420549 i
> 
>  
> 
> I have verified that the neighbor fe80::ade0 is actually getting a
> connection and sending me route updates. Here is an example:
> 
>  
> 
> V fdff:feed:c0de::/48  :: 20 0 4242423914 4242420585
> 4242422980 210074 64719 65043 4242420138 i
> 
>  
> 
> Any ideas?

Hard to judge from the little information you share but the :: nexthop is
for sure not good. Because of this the route to fd00:bb:5bf3::/48 via AS
4242423914 is not valid and can't be selected as route.
Not sure what exactly goes on there but you need to fix that.

Also check out 'bgpctl show rib nei fe80::ade0 in' to see the unfiltered
routes.

-- 
:wq Claudio



Re: bgpd.conf rules changed?

2022-12-19 Thread Claudio Jeker
On Mon, Dec 19, 2022 at 12:41:26PM +0100, Toni Mueller wrote:
> 
> Hi,
> 
> I am trying to upgrade an OpenBSD based BGP router from an old version
> to 7.2. But on OpenBSD 7.2, the config file results in several errors,
> despite the man page not indicating any thing "obvious".
> 
> Eg. I get syntax errors on
> 
>   softreconfig in yes
>   softreconfig out yes
>   announce self
>   announce all
>   announce default-route
 
You update from a very old version of OpenBGPD.

softreconfig is gone. softreconfig is now on by default and can't be
turned off. Just remove all these lines.

announce was replaced with export with the introduction of a default deny
rule. So announce none became export none and announce default-route is
now export default-route. announce self no longer exists and must be
written by a explicity pass rule. See the example bgpd.conf file for a
suggestion.
 
> I also get errors on
> 
>   tcp md5sig password  somesecrethere
> 
> if the secret contains special characters.
 
Always use "" around non-basic strings. tcp md5sig password "some secret"
should work.
 
> I have tried to comment the softreconfig lines, but can't do away with
> the 'announce' statements.
> 
> Is there some overview about what changed over the course of time, and
> possibly, some better error messages to help diagnose the errors?

Have a look at the current example bgpd.conf file. It shows how a config
and especially example filters should be written.

Some of these changes were covered in
https://www.openbsd.org/faq/upgrade65.html

Here are the commit messages:
date: 2018/06/13 09:33:51;  author: claudio; commitid: oGYqi7HT1AMsWI15;
Deprecate announce (all|self|none|default-route)
The announce keyword was overloaded and confused a lot of operators, time
to clean it up and while there incorporate RFC8212 guideline for
propagation.
- `announce all` is the new default but the default deny filter will
  make sure that by default nothing is leaked
- `announce self` is no more and results in syntax error
- `announce none` is now `export none`
- `announce default-route` becomes `export default-route`
- the filters are switched to a default deny rule both incoming and
  outgoing

You most certainly need to adjust your config!

Best is to change the config in advance by using `announce all` explicitly on
all neighbors and adding `deny from any` and `deny to any` at the start of
your filters and adjust the rest of the filters to still produce the same
result.  `bgpd -nv -f bgpd.conf ` and `bgpctl show rib out nei foo` are good
tools to verify the changes.
Lots of discussions with job@, deraadt@, sthen@
OK job@

date: 2017/08/11 16:02:53;  author: claudio; commitid: TArqhzl9aciTsGlE;
softreconfig in and out are on by default for ever and machines now have
enough memory that it does not make sense to provide these knobs anymore.
They just make the code more complex for no much gain.
OK phessler@, benno@

-- 
:wq Claudio



Re: ex/vi 100% CPU when STDIN_FILENO set to O_NONBLOCK

2022-12-12 Thread Claudio Jeker
On Sun, Dec 11, 2022 at 04:10:41PM -0800, Jeremy Mates wrote:
>  ...
>  42136 ex   RET   read -1 errno 35 Resource temporarily unavailable
>  42136 ex   CALL  read(0,0x3d94b585400,0xff)
>  42136 ex   RET   read -1 errno 35 Resource temporarily unavailable
>  42136 ex   CALL  read(0,0x3d94b585400,0xff)
>  ...
>  
> this condition can be reproduced with:
> 
>   #include 
>   #include 
>   #include 
>   #include 
>   #include 
>   #define TARGET_FD STDIN_FILENO
>   int main(int argc, char *argv[]) {
>   int flags, status;
>   pid_t pid;
>   pid = fork();
>   if (pid < 0) err(1, "fork failed");
>   if (pid == 0) {
>   flags = fcntl(TARGET_FD, F_GETFL, 0);
>   if (flags == -1) err(1, "fcntl getfl failed");
>   flags |= O_NONBLOCK;
>   flags = fcntl(TARGET_FD, F_SETFL, flags);
>   if (flags == -1) err(1, "fcntl setfl failed");
>   argv++;
>   execvp(*argv, argv);
>   err(1, "execvp failed");
>   }
>   wait();
>   exit(0);
>   }
> 
> and then running whatever-the-above-was-compiled-to as
> 
> ./whatever /usr/bin/ex
> 
> or also under modern code that for some reason sets O_NONBLOCK and
> forgets to turn it off when calling out to an editor, hypothetically
> 
> https://github.com/osa1/tiny
> 
> and likely other, similar programs. Probably, O_NONBLOCK should be
> disabled on STDIN_FILENO before calling out to unknown programs.
> Probably, vi should be patched to not eat CPU when the previous case is
> not handled.
> 
> Thoughts?

I think this is the wrong way around. The callers need to be fixed to pass
a blocking stdin to programs since that is what every unix utility
expects. What you propose it to fix every unix utility to have such a check
at the start of main. Sorry but no.
 
> diff --git usr.bin/vi/cl/cl_main.c usr.bin/vi/cl/cl_main.c
> index 33614c99594..f87a04cad8b 100644
> --- usr.bin/vi/cl/cl_main.c
> +++ usr.bin/vi/cl/cl_main.c
> @@ -54,7 +54,7 @@ main(int argc, char *argv[])
>   CL_PRIVATE *clp;
>   GS *gp;
>   size_t rows, cols;
> - int rval;
> + int flags, oflags, rval;
>   char *ttype;
>  
>   /* Create and initialize the global structure. */
> @@ -89,6 +89,14 @@ main(int argc, char *argv[])
>   /* Ex wants stdout to be buffered. */
>   (void)setvbuf(stdout, NULL, _IOFBF, 0);
>  
> + /* Ensure blocking I/O to avoid 100% CPU on EAGAIN */
> + if ((flags = fcntl(STDIN_FILENO, F_GETFL, 0)) == -1)
> + exit (1);
> + oflags = flags;
> + flags &= ~O_NONBLOCK;
> + if (fcntl(STDIN_FILENO, F_SETFL, flags) == -1)
> + exit (1);
> +
>   /* Start catching signals. */
>   if (sig_init(gp, NULL))
>   exit (1);
> @@ -102,6 +110,9 @@ main(int argc, char *argv[])
>   /* Clean up the terminal. */
>   (void)cl_quit(gp);
>  
> + /* Restore flags */
> + fcntl(STDIN_FILENO, F_SETFL, oflags);
> +
>   /*
>* XXX
>* Reset the O_MESG option.
> 

-- 
:wq Claudio



Re: bgp conditional advertisement

2022-11-30 Thread Claudio Jeker
On Thu, Dec 01, 2022 at 01:01:16AM +0200, Gregory Edigarov wrote:
> Hello, 
> 
> Having two sites in different physical locations, siteA is connected
> via uplink1 and uplink2, siteB is connected via uplink3 and uplink4.
> I want to announce prefixes from siteB if ASn not found originating
> from siteA, and vice versa. I.e. a feature that will work alike 'enforce
> localas yes' but start announces when ASn is gone. I could done it with
> some scripting, but would prefer to have it in bgpd. 
> Is this possible solely with OpenBGPD?

Run an ibgp session between siteA and siteB. Announce only your prefixes
on those sessions. Tag them with a community. Make sure that these
prefixes are more preferred than the one you put in as backup. Filter out
prefixes with the tag. More or less like this:

# backup route using low localpref to be less preferred
network 192.0.2.0/24 set { localpref 1 }

# send my networks to siteA tagged with community
deny to siteA
allow to siteA prefix-set mynetworks set community local-as:42
# filter out announcement originated from siteA
deny to any community local-as:42

-- 
:wq Claudio



Re: slaacd, MTUs, and pledge

2022-11-21 Thread Claudio Jeker
On Sun, Nov 20, 2022 at 05:28:06PM -0500, Stefan R. Filipek wrote:
> My router advertises its MTU over ICMPv6 router advertisements. It's
> somewhat large (9216), and exceeds the hardware capabilities of my
> OpenBSD system's rge interface (9194). This results in a bunch of
> noisy log messages of:
> 
> > slaacd[...]: failed to set MTU: Invalid argument
> 
> And the obvious outcome where slaacd doesn't actually adjust the MTU
> to something larger.
> 
> I thought I'd be helpful and make a patch where slaacd clamps to the
> maximum hardware capability before attempting to set the MTU. However,
> I got blocked by pledge: There currently is no pledge that gives
> access to SIOCGIFHARDMTU.
> 
> So, some questions arise:
> 1. Does it make sense to add SIOCGIFHARDMTU (and maybe SIOCGIFMTU too)
> to pledge("route")?
> 2. Should slaacd clamp at all or or have some additional settings for
> MTU control?

You announce an MTU that is larger than the interface can handle. In other
words you may end up with packet loss. The only sane fix for your issue is
to lower your RA's mtu from 9216 down to the max of what all your hardware
on that segment can handle. If rge(4) has the lowest MRU then it has to be
9194. Else a system may try to send a 9200 byte packet to your rge(4)
which will fail and it will take a lot of time and resources to figure out
why.

I see no reason to change anything right now for this. Maybe the error
message could include the number it tries to set. But slaacd should fail
as hard as possible in this case because you can't properly connect this
this network.

-- 
:wq Claudio



Re: bgpd VPNs broken in kroute with 7.2?

2022-11-04 Thread Claudio Jeker
On Fri, Nov 04, 2022 at 10:18:26AM +0300, Bars Bars wrote:
> Hi, Claudio!
> 
> It seems there were at least two issues:
> 1. VPN routes were never installed to fib (with errno 'Network is
> unreachable'
> returned when send_rtmsg tried to writev them)
> 2. kroute_remove brakes when prefix withdraw comes from rde (with 'Not
> handled AID')
> 
> I applied your patch and vpn routes now get installed to the fib!
> But kroute_remove cannot handle vpn prefixes withdrawal still.
> I manually triggered prefix withdraw on the other side of bgp session and
> hooked the prefix at kroute_remove just before it returned -1.
> "
> kroute_remove: rd 65001:100 10.42.200.9/32 NH ???
> kroute_remove: not handled AID
> "
> So I extend the patch abit and the issue 2 seems to go:
> (Not sure that I did it right. Also don't know if kf->nexthop = '???' is ok
> in kroute_remove during withdrawal, but fib reflects correctly.)
> 

It seems that was only part of the fix for withdraws. There is another bug
in the nlri parser where the code fails to properly jump over the implicit
label.

This diff seems to work for me. The util.c changes fix the problem when
parsing the MP_UNREACH_NLRI attribute.

I tested the v4 case but for v6 another can of worms showed up. It is not
posible to configure IPv6 addrs on mpe(4) which makes it impossible to
inject IPv6 VPN routes into the rdomain :(
-- 
:wq Claudio

Index: kroute.c
===
RCS file: /cvs/src/usr.sbin/bgpd/kroute.c,v
retrieving revision 1.301
diff -u -p -r1.301 kroute.c
--- kroute.c18 Oct 2022 09:30:29 -  1.301
+++ kroute.c4 Nov 2022 10:10:58 -
@@ -580,6 +580,9 @@ krVPN4_change(struct ktable *kt, struct 
(kf->prefix.labelstack[2] << 8);
mplslabel = htonl(mplslabel);
 
+   kf->flags |= F_MPLS;
+   kf->mplslabel = mplslabel;
+
/* for blackhole and reject routes nexthop needs to be 127.0.0.1 */
if (kf->flags & (F_BLACKHOLE|F_REJECT))
kf->nexthop.v4.s_addr = htonl(INADDR_LOOPBACK);
@@ -590,6 +593,7 @@ krVPN4_change(struct ktable *kt, struct 
return (-1);
} else {
kr->mplslabel = mplslabel;
+   kr->flags |= F_MPLS;
kr->ifindex = kf->ifindex;
kr->nexthop.s_addr = kf->nexthop.v4.s_addr;
rtlabel_unref(kr->labelid);
@@ -632,6 +636,9 @@ krVPN6_change(struct ktable *kt, struct 
(kf->prefix.labelstack[2] << 8);
mplslabel = htonl(mplslabel);
 
+   kf->flags |= F_MPLS;
+   kf->mplslabel = mplslabel;
+
/* for blackhole and reject routes nexthop needs to be ::1 */
if (kf->flags & (F_BLACKHOLE|F_REJECT))
memcpy(>nexthop.v6, , sizeof(kf->nexthop.v6));
@@ -642,6 +649,7 @@ krVPN6_change(struct ktable *kt, struct 
return (-1);
} else {
kr6->mplslabel = mplslabel;
+   kr6->flags |= F_MPLS;
kr6->ifindex = kf->ifindex;
memcpy(>nexthop, >nexthop.v6, sizeof(struct in6_addr));
kr6->nexthop_scope_id = kf->nexthop.scope_id;
@@ -1878,9 +1886,11 @@ kroute_remove(struct ktable *kt, struct 
 
switch (kf->prefix.aid) {
case AID_INET:
+   case AID_VPN_IPv4:
multipath = kroute4_remove(kt, kf, any);
break;
case AID_INET6:
+   case AID_VPN_IPv6:
multipath = kroute6_remove(kt, kf, any);
break;
default:
Index: util.c
===
RCS file: /cvs/src/usr.sbin/bgpd/util.c,v
retrieving revision 1.71
diff -u -p -r1.71 util.c
--- util.c  17 Aug 2022 15:15:26 -  1.71
+++ util.c  4 Nov 2022 10:08:24 -
@@ -131,7 +131,9 @@ log_rd(uint64_t rd)
snprintf(buf, sizeof(buf), "rd %s:%hu", inet_ntoa(addr), u16);
break;
default:
-   return ("rd ?");
+   snprintf(buf, sizeof(buf), "rd #%016llx",
+   (unsigned long long)rd);
+   break;
}
return (buf);
 }
@@ -596,6 +598,7 @@ nlri_get_vpn4(u_char *p, uint16_t len, s
return (-1);
if (withdraw) {
/* on withdraw ignore the labelstack all together */
+   p += 3;
plen += 3;
pfxlen -= 3 * 8;
break;
@@ -659,6 +662,7 @@ nlri_get_vpn6(u_char *p, uint16_t len, s
return (-1);
if (withdraw) {
/* on withdraw ignore the labelstack all together */
+   p += 3;
plen += 3;
pfxlen -= 3 * 8;
break;



Re: bgpd VPNs broken in kroute with 7.2?

2022-11-03 Thread Claudio Jeker
On Mon, Oct 31, 2022 at 09:54:12AM +0300, Bars Bars wrote:
> Hi!
> 
> Just upgraded to 7.2 and bgpd began to crash with VPNs, not immediately
> but in 1 minute after daemon start (probably the issue happens
> when prefix withdraw received or so, and rde goes to change the fib, not
> sure).
> If only using IPv4 sessions and keeping VPN sessions down then it works
> stable.
> "
> kroute_remove: not handled AID
> peer closed imsg connection
> SE: Lost connection to parent
> peer closed imsg connection notification: Cease, administratively down
> fatal in RTR: Lost connection to parent
> peer closed imsg connection
> fatal in RDE: Lost connection to parent
> "
> im not sure that is a bug, but there was huge kroute refactoring under bgpd
> source tree since 7.1 and it seems that routes with VPN4/VPN6 AIDs are
> now handled very differently. Im bad at code to
> investigate and to try to fix the issue, so i simply rolled back
> bgpd/bgpctl
> to 7.1 base revision and rebuild, ok now.
> Сan't imagine what else I can do.

Please try the following diff. It should fix the problem with MPLS routes.

-- 
:wq Claudio

Index: kroute.c
===
RCS file: /cvs/src/usr.sbin/bgpd/kroute.c,v
retrieving revision 1.301
diff -u -p -r1.301 kroute.c
--- kroute.c18 Oct 2022 09:30:29 -  1.301
+++ kroute.c3 Nov 2022 13:42:11 -
@@ -580,6 +580,9 @@ krVPN4_change(struct ktable *kt, struct 
(kf->prefix.labelstack[2] << 8);
mplslabel = htonl(mplslabel);
 
+   kf->mplslabel = mplslabel;
+   kf->flags |= F_MPLS;
+
/* for blackhole and reject routes nexthop needs to be 127.0.0.1 */
if (kf->flags & (F_BLACKHOLE|F_REJECT))
kf->nexthop.v4.s_addr = htonl(INADDR_LOOPBACK);
@@ -590,6 +593,7 @@ krVPN4_change(struct ktable *kt, struct 
return (-1);
} else {
kr->mplslabel = mplslabel;
+   kr->flags |= F_MPLS;
kr->ifindex = kf->ifindex;
kr->nexthop.s_addr = kf->nexthop.v4.s_addr;
rtlabel_unref(kr->labelid);
@@ -632,6 +636,9 @@ krVPN6_change(struct ktable *kt, struct 
(kf->prefix.labelstack[2] << 8);
mplslabel = htonl(mplslabel);
 
+   kf->flags |= F_MPLS;
+   kf->mplslabel = mplslabel;
+
/* for blackhole and reject routes nexthop needs to be ::1 */
if (kf->flags & (F_BLACKHOLE|F_REJECT))
memcpy(>nexthop.v6, , sizeof(kf->nexthop.v6));
@@ -642,6 +649,7 @@ krVPN6_change(struct ktable *kt, struct 
return (-1);
} else {
kr6->mplslabel = mplslabel;
+   kr6->flags |= F_MPLS;
kr6->ifindex = kf->ifindex;
memcpy(>nexthop, >nexthop.v6, sizeof(struct in6_addr));
kr6->nexthop_scope_id = kf->nexthop.scope_id;



Re: Triple booting Windows/Debian/OpenBSD?

2022-11-01 Thread Claudio Jeker
On Tue, Nov 01, 2022 at 02:20:38PM +, Ottavio Caruso wrote:
> Op 01/11/2022 om 13:16 schreef Claudio Jeker:
> > On Tue, Nov 01, 2022 at 12:42:10PM +, Maurice McCarthy wrote:
> > > I think you are asking for a world of grief.
> > Not really, just be careful when installing any additional OS on a
> > multiboot system. They like to trample on each others toes
> 
> Thanks.
> 
> Incidentally, is suspend/resume to RAM supposed to work on OpenBSD? Because
> it didn't work on NetBSD. I know they are two different ecospheres but you
> never know.
 
Generally suspend to RAM works fine. If not file a bug report.

-- 
:wq Claudio



Re: Triple booting Windows/Debian/OpenBSD?

2022-11-01 Thread Claudio Jeker
On Tue, Nov 01, 2022 at 12:42:10PM +, Maurice McCarthy wrote:
> I think you are asking for a world of grief.

Not really, just be careful when installing any additional OS on a
multiboot system. They like to trample on each others toes.

In the OpenBSD installer be careful and do not select whole disk.
 
> sda5 is likely to be on an extended partition. That is trouble booting.

This is GPT and EFI. I had no trouble booting OpenBSD from large offsets.
Btw. you can use the linux efibootmgr to set a menu entry for OpenBSD.
With that you can use the boot menu to select what to boot.
 
> You cannot use the linux swap partition easily, though it might be
> possible, reformatting on change of operation system, ???!!!

I would not reuse swap partitions. Mainly because hibernate uses swap to
store the image. So if you hibernate and boot into a different OS that
would destroy your image.
 
> I'd advise against even trying. Unless you enjoy pain, that is.

Honestly there is no big issue if your careful and have backups ready.
Sure it is far easier to install on individual disks but heck not every
system has that luxury. 

-- 
:wq Claudio



Re: bgpd loadbalancing feature

2022-08-13 Thread Claudio Jeker
On Sat, Aug 13, 2022 at 08:27:53AM +0200, Holger Glaess wrote:
> hi
> 
> 
> i need a little bit help to understand how i can check if
> 
> the new openbgpd do the loadbalancing
> 
> 
> wendehals# bgpctl sh nei 172.16.2.251
> BGP neighbor is 172.16.2.251, remote AS 65010
>   BGP version 4, remote router-id 172.16.2.251
>   BGP state = Established, up for 00:14:33
>   Last read 00:00:03, holdtime 90s, keepalive interval 30s
>   Last write 00:00:03
>   Neighbor capabilities:
>     Multiprotocol extensions: IPv4 unicast, IPv4 vpn
>     4-byte AS numbers
>     Route Refresh
>     Graceful Restart
>     Add-path: IPv4 unicast bidir, IPv4 vpn bidir
>   Negotiated capabilities:
>     Multiprotocol extensions: IPv4 unicast, IPv4 vpn
>     4-byte AS numbers
>     Route Refresh
>     Graceful Restart
>     Add-path: IPv4 unicast bidir, IPv4 vpn bidir
> 
>   Message statistics:
>   Sent   Received
>   Opens    3  3
>   Notifications    0  2
>   Updates 26 25
>   Keepalives  98 98
>   Route Refresh    0  0
>   Total  127    128
> 
>   Update statistics:
>   Sent   Received
>   Prefixes 9  9
>   Updates  9  9
>   Withdraws    0  0
>   End-of-Rib   2  2
>   Route Refresh statistics:
>   Request  0  0
>   Begin-of-RR  0  0
>   End-of-RR    0  0
> 
>   Last received shutdown reason: "bgpd shutting down"
>   Local host:  172.16.2.252, Local port:    179
>   Remote host: 172.16.2.251, Remote port: 48848
> 
> 
> mean the bidir flag my neibgbor have loadbalancing configured and active ?
> 
> 
> wendehals# bgpctl sh rib
> flags: * = Valid, > = Selected, I = via IBGP, A = Announced,
>    S = Stale, E = Error
> origin validation state: N = not-found, V = valid, ! = invalid
> origin: i = IGP, e = EGP, ? = Incomplete
> 
> flags ovs destination  gateway  lpref   med aspath origin
> *>  N 172.16.1.1/32    172.16.12.5   100 0 65100 i
> *m  N 172.16.1.1/32    172.16.13.1   100 1 65101 i
> I*  N 172.16.1.1/32    172.16.13.5   100 1 65101 i
> I*  N 172.16.1.1/32    172.16.12.1   100 0 65100 i
> *>  N 172.16.1.2/32    172.16.12.5   100 1 65100 i
> *m  N 172.16.1.2/32    172.16.13.1   100 0 65101 i
> I*  N 172.16.1.2/32    172.16.13.5   100 0 65101 i
> I*  N 172.16.1.2/32    172.16.12.1   100 1 65100 i
> I*> N 172.16.2.251/32  172.16.2.251  100 0 i
> *   N 172.16.2.251/32  172.16.12.5   100    11 65100 i
> *   N 172.16.2.251/32  172.16.13.1   100    11 65101 i
> I*  N 172.16.2.251/32  172.16.12.2   100    11 65100 i
> I*  N 172.16.2.251/32  172.16.13.6   100    11 65101 i
> AI*>    N 172.16.2.252/32  0.0.0.0   100 0 i
> *   N 172.16.2.252/32  172.16.12.6   100    11 65100 i
> *   N 172.16.2.252/32  172.16.13.2   100    11 65101 i
> I*  N 172.16.2.252/32  172.16.13.5   100    11 65101 i
> I*  N 172.16.2.252/32  172.16.12.1   100    11 65100 i
> 
> 
> i see 2 paths with asterisk to 172.16.2.252 ,
> 
> this should say loadblancing is active ?
> 
> 
> is there an other opportunity  to check if the bgpd have loadblancing active
> ?
> 

If with loadbalancing you mean ECMP routing. Then no, bgpd does not
support equal cost multipath. Only one (the best marked '>') is used for
forwarding traffic.

There is slow work ongoing to make ECMP happen but don't expect it anytime
soon.
-- 
:wq Claudio



Re: bridge rules are evaluated different compared to pf?

2022-07-26 Thread Claudio Jeker
On Tue, Jul 26, 2022 at 11:18:06AM +0300, Cristian Danila wrote:
> Good day!
> I hope someone could clarify if the following behavior is
> expected in a bridge configuration
> I have following rules added in hostname.bridge0
> 
> ---
> #this will result out to be blocked
> rule block in on vic0
> rule block out on vic0
> rule pass out on vic0
> 
> #this will result out to be passed
> #rule block in on vic0
> #rule pass out on vic0
> #rule block out on vic0
> 
> As you see in comments the uncommented section will block out
> traffic and second section will let it pass it. Somehow these
> rules behaves like rules added to pf but with 'quick' keyword.
> So I deduce that a catch all policy must be added last and not
> first like in pf
> 
> In manpage of ifconfig I see this:
> "Rules are processed in the order in which they were added to
> the interface"
> So I believe it makes sense the behavior but I just want to
> confirm with you this behavior as I read in a book(Building
> Firewalls With OpenBSD And PF) the opposite:
> 
> "rule block out on ne1
> rule pass out on ne1 src 00:00:00:00:00:01
> rule pass out on ne1 src 00:00:00:00:00:02
> rule pass out on ne1 src 00:00:00:00:00:03
> Please note that the last matching rule wins, hence the
> global block or pass rule should be listed before more
> specific rules."
> 
> I would like to understand if the book has a mistake or I do
> something wrong.

The manpage actually has a bit more:
 Rules are processed in the order in which they were added to the
 interface.  The first rule matched takes the action ...

So the book got this wrong. bridge(4) uses a first match logic unlike
pf(4) where last match is the default.

-- 
:wq Claudio



Re: RFC7432 (EVPN)

2022-07-18 Thread Claudio Jeker
On Sun, Jul 17, 2022 at 09:13:52PM +0200, Holger Glaess wrote:
> hi
> 
> 
> is there an plan or think about it to implement this RFC ?
> 

There is no plan to do this work. It will requirer a good amount of work
to make this proper. There is currently no way to program the LUT of
veb(4) / bridge(4) from userland. For bgpd this would need to happen via
the routing socket. The BGP NLRI encoding will also require some thoughts.

First step would probably be the implementation of the various extended
communities then add the NLRI bits so that the AFI/SAFI support can be
added to bgpd. At that point you can use bgpd as a route reflector. As
said above "FIB" support will require a good amount of work both in bgpd
and in the kernel.

-- 
:wq Claudio



Re: OpenBGPD via (WG?) Tunnel Not Learning Routes

2022-07-13 Thread Claudio Jeker
On Wed, Jul 13, 2022 at 11:01:09AM -, Stuart Henderson wrote:
> On 2022-07-13, Tobias Fiebig  wrote:
> > Heho,
> >
> > When doing what i described in my message, I get the below messages.
> >
> > When I set static routes, packet forwarding works fine, i.e.:
> >
> > gw02.dus01.as59645.net ~ # route add -inet6 2a06:d1c2::/48 
> > 2a06:d1c0::dead:beef:c02 
> > add net 2a06:d1c2::/48: gateway 2a06:d1c0::dead:beef:c02
> >
> > bgp-test.test /etc # route add -inet6 default 2a06:d1c0::dead:beef:c01
> > add net default: gateway 2a06:d1c0::dead:beef:c01
> >
> > Removing those routes and restarting the BGPD then also leads to a 
> > successful import of routes, see bgpctl sh nex at the bottom of this mail.
> >
> > It somehow feels like bgpd does not register that wg0 came up.
> 
> Yes.
> 
> You can check with "route -n monitor" that the route messages are correctly
> sent when the interface is brought up, also try running bgpd in the foreground
> with debug logging (bgpd -vvvd or so) and see if any errors/warnings are
> logged when wg comes up.

Looking at the show nexthop output it seem bgpd does not get the
RTM_IFINFO message with the IFP_UP flag set. It still thinks the interface
is down. This is a bug in wg(4) which probably sends the rt message before
applying the flag.
 
> > Let me try if this behavior is the same for other tunnels (eoip).
> 
> Worth a try. Also maybe different between v4 and v6, WireGuard doesn't really
> do v6 properly.

The v4 part is also not great to be honest. Doing dynamic routing via
WireGuard is just close to impossible with the way WireGuard is specified.
It is not a simple tunnel but applies some route limits on top which you
can't really disable.

Also because of multicast issues you can't run ospfd over wg(4) so I had
to put a gif tunnel in a wg tunnel to have dynamic routing.

-- 
:wq Claudio



Re: Cron running at 99% CPU for seemingly no reason

2022-06-27 Thread Claudio Jeker
On Sun, Jun 19, 2022 at 01:26:27PM +0200, Stephan Mending wrote:
> Hi, 
> it crashed again. 
> Here is the dmesg, this time the kernel had debugging symbols enabled. 
> 
> [...]
> ic0 at ichiic0
> spdmem0 at iic0 addr 0x50: 2GB DDR3 SDRAM PC3-12800 SO-DIMM
> isa0 at pcib0
> isadma0 at isa0
> vga0 at isa0 port 0x3b0/48 iomem 0xa/131072
> wsdisplay at vga0 not configured
> pcppi0 at isa0 port 0x61
> spkr0 at pcppi0
> wbsio0 at isa0 port 0x2e/2: NCT5104D rev 0x53
> wbsio0 port 0xa10/2 not configured
> vmm0 at mainbus0: VMX/EPT
> run0 at uhub0 port 4 configuration 1 interface 0 "Ralink 802.11 n WLAN" rev 
> 2.0
> 0/1.01 addr 2
> run0: MAC/BBP RT5592 (rev 0x0222), RF RT5592 (MIMO 2T2R), address 
> d8:61:62:37:5
> 6:c8   
> uhub2 at uhub1 port 1 configuration 1 interface 0 "Intel Rate Matching Hub" 
> rev
>  2.00/0.04 addr 2
>  vscsi0 at root
>  scsibus2 at vscsi0: 256 targets
>  softraid0 at root
>  scsibus3 at softraid0: 256 targets
>  root on sd0a (7ec83d15890e2a71.a) swap on sd0b dump on sd0b
>  inteldrm0: 1024x768, 32bpp
>  wsdisplay0 at inteldrm0 mux 1
>  wsdisplay0: screen 0-5 added (std, vt100 emulation)
>  kernel: protection fault trap, code=0
>  Stopped at  icmp_mtudisc_timeout+0x77 
> [/usr/src/sys/netinet/ip_icmp.c:1072]
>  :   movq0(%rax),%rcx
>  ddb{0}> ddb{0}>
>  ddb{0}> bt  
>  icmp_mtudisc_timeout(fd807a4e0620,0) at icmp_mtudisc_timeout+0x77 
> [/usr/src
>  /sys/netinet/ip_icmp.c:1072]
>  rt_timer_timer(82324248) at rt_timer_timer+0x1cc 
> [/usr/src/sys/net/rout
>  e.c:1551]
>  softclock_thread(8000f260) at softclock_thread+0x13b 
> [/usr/src/sys/kern
>  /kern_timeout.c:681]
>  end trace frame: 0x0, count: -3
>  ddb{0}> call db_show_rtentry(fd807a4e0620, 0, 0)  
>  Symbol not found
> 
> I'd love to know whats going wrong here.
> 

This is a race condition in the rttimer code that was introduced by bluhm@
when he added the mutex around the global list.
Can you try the diff below which is a refactor I did some time ago which
changes this and uses a per route timeout instead of the global one. With
this we should not have this use after free anymore.

-- 
:wq Claudio

Index: net/route.c
===
RCS file: /cvs/src/sys/net/route.c,v
retrieving revision 1.410
diff -u -p -r1.410 route.c
--- net/route.c 5 May 2022 13:57:40 -   1.410
+++ net/route.c 13 May 2022 11:49:00 -
@@ -1361,7 +1361,16 @@ rt_ifa_purge_walker(struct rtentry *rt, 
  */
 
 struct mutex   rttimer_mtx;
-LIST_HEAD(, rttimer_queue) rttimer_queue_head; /* [T] */
+
+struct rttimer {
+   TAILQ_ENTRY(rttimer)rtt_next;   /* [T] entry on timer queue */
+   LIST_ENTRY(rttimer) rtt_link;   /* [T] timers per rtentry */
+   struct timeout  rtt_timeout;/* [I] timeout for this entry */
+   struct rttimer_queue*rtt_queue; /* [I] back pointer to queue */
+   struct rtentry  *rtt_rt;/* [T] back pointer to route */
+   time_t  rtt_expire; /* [I] rt expire time */
+   u_int   rtt_tableid;/* [I] rtable id of rtt_rt */
+};
 
 #define RTTIMER_CALLOUT(r) {   \
if (r->rtt_queue->rtq_func != NULL) {   \
@@ -1388,15 +1397,9 @@ LIST_HEAD(, rttimer_queue)   rttimer_queue
 void
 rt_timer_init(void)
 {
-   static struct timeout   rt_timer_timeout;
-
pool_init(_pool, sizeof(struct rttimer), 0,
IPL_MPFLOOR, 0, "rttmr", NULL);
-
mtx_init(_mtx, IPL_MPFLOOR);
-   LIST_INIT(_queue_head);
-   timeout_set_proc(_timer_timeout, rt_timer_timer, _timer_timeout);
-   timeout_add_sec(_timer_timeout, 1);
 }
 
 void
@@ -1407,10 +1410,6 @@ rt_timer_queue_init(struct rttimer_queue
rtq->rtq_count = 0;
rtq->rtq_func = func;
TAILQ_INIT(>rtq_head);
-
-   mtx_enter(_mtx);
-   LIST_INSERT_HEAD(_queue_head, rtq, rtq_link);
-   mtx_leave(_mtx);
 }
 
 void
@@ -1453,6 +1452,25 @@ rt_timer_queue_count(struct rttimer_queu
return (rtq->rtq_count);
 }
 
+static inline struct rttimer *
+rt_timer_unlink(struct rttimer *r)
+{
+   MUTEX_ASSERT_LOCKED(_mtx);
+
+   LIST_REMOVE(r, rtt_link);
+   r->rtt_rt = NULL;
+
+   if (timeout_del(>rtt_timeout) == 0) {
+   /* timeout fired, so rt_timer_timer will do the cleanup */
+   return NULL;
+   }
+
+   TAILQ_REMOVE(>rtt_queue->rtq_head, r, rtt_next);
+   KASSERT(r->rtt_queue->rtq_count > 0);
+   r->rtt_queue->rtq_count--;
+   return r;
+}
+
 void
 rt_timer_remove_all(struct rtentry *rt)
 {
@@ -1462,11 +1480,9 @@ rt_timer_remove_all(struct rtentry *rt)
TAILQ_INIT();
mtx_enter(_mtx);
while ((r = LIST_FIRST(>rt_timer)) != NULL) {
-   LIST_REMOVE(r, rtt_link);
-   TAILQ_REMOVE(>rtt_queue->rtq_head, r, rtt_next);
-   

Re: Resizing encrypted disk

2022-06-25 Thread Claudio Jeker
On Sun, Jun 26, 2022 at 04:25:56AM +0100, Chris Narkiewicz wrote:
> I have a network-attached block device that is used
> as an encrypted device:
> 
> bioctl -c C -l /dev/sd1a -p /keydisk softraid0
> 
> Underlying volume is about to be resized, but I can't resize
> the decrypted volume. Here is what I did using a USB stick
> (as a dress rehearsal):
> 
> 1. 32GB usb stick, no gpt or mbr parts
> 2. disklabel -E sd1
> 3. added partition a of type RAID
> 4. bioctl -c C -l /dev/sd1a softraid0 -> sd2 appears
> 5. disklabel -E sd2
> 6. newfs sd2a
> 7. bioctl -d sd2
> 
> Now, I modified sd1a partition by growing it.
> When I attach the volume using bioctl, it mounts,
> but disklabel -v sd2 shows the same number of sectors and
> and I'm unable to grow the decrypted partition sd2a to fill sd1a.
> 
> Is this workflow supported? I'd be thankful for any advice.

You can have a look at https://github.com/cjeker/growsoftraid
it should allow you to update the softraid metadata to grow the size to
the one of the RAID partition.

It worked for me but make sure you have a backup ready in case something
goes wrong.

-- 
:wq Claudio



Re: tail(1) with multiple FIFOs

2022-06-09 Thread Claudio Jeker
On Thu, Jun 09, 2022 at 02:34:27PM +0200, Martijn van Duren wrote:
> The "problem" is that a FIFO without data hangs on open(2), until data
> is available, the same goes for the initial read of the file.
> 
> We could work around this by adding the O_NONBLOCK flag to a separate
> open(2) call, but I know that this flag is frowned upon. Since gnu tail
> shows the same behaviour I'm not sure if it's worth doing.
> 
> Since I'm not claiming to know all the edge-cases of O_NONBLOCK you
> could carry this diff locally at your own risk. Or maybe if other
> developers feel strong about this and are braver than me when it comes
> to O_NONBLOCK it might go somewhere.

But this does not really work the way people would expect. This will just
jump over FIFOs that have no writer. It will not start to report messages
from those FIFOs when a writer connects later on.

A proper fix would need some probably kqueue magic to get an event when a
FIFO becomes readable.

Since tail -f on fifos is a very uncommon operation I see no need to write
a lot of code to support this questionable mode of operation.

-- 
:wq Claudio
 
> martijn@
> 
> On Wed, 2022-06-08 at 22:09 -0400, Philippe Meunier wrote:
> > Hi,
> > 
> > Try:
> > 
> > $ mkfifo fifo1 fifo2
> > $ tail -f fifo1 fifo2
> > 
> > Then in another terminal:
> > 
> > $ while true; do /bin/echo  > fifo1; done
> > 
> > and... nothing happens.  I would have expected tail(1) to start showing the
> > content of fifo1 as soon as content became available but no, it just keeps
> > waiting.
> > 
> > Then in another terminal:
> > 
> > $ while true; do /bin/echo  > fifo2; done
> > 
> > and then tail(1) starts showing output as expected, alternating between
> > fifo1 and fifo2.
> > 
> > The interesting part is that, once tail(1) has started producing output,
> > you can interrupt and restart one or both of the "" and / or ""
> > loops and tail(1) always does what you'd expect.  It seems that it's only
> > at the very start that tail(1) doesn't produce any output until content is
> > available in both fifos.
> > 
> > I tried various things like -n 0 and -c 0 but to no avail.
> > 
> > Another interesting thing to try:
> > - start the "" loop
> > - interrupt the "" loop
> > - start the "" loop
> > and tail(1) starts displaying output.
> > 
> > But if you try:
> > - start the "" loop
> > - interrupt the "" loop
> > - start the "" loop
> > then tail(1) still doesn't show any output, until you start the "" loop
> > for a second time!
> > 
> > So at the very start, not only does tail(1) seem to expect content in both
> > fifos before it start showing output, but it also seems to expect the
> > content to appear in the specific order indicated on the tail(1) command
> > line.
> > 
> > I assume this is a bug in tail(1)?
> > 
> > Cheers,
> > 
> > Philippe
> > 
> > 
> Index: tail.c
> ===
> RCS file: /cvs/src/usr.bin/tail/tail.c,v
> retrieving revision 1.22
> diff -u -p -r1.22 tail.c
> --- tail.c4 Jan 2019 15:04:28 -   1.22
> +++ tail.c9 Jun 2022 12:33:24 -
> @@ -37,6 +37,7 @@
>  
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -56,7 +57,7 @@ main(int argc, char *argv[])
>   off_t off = 0;
>   enum STYLE style;
>   int ch;
> - int i;
> + int i, fd;
>   char *p;
>  
>   if (pledge("stdio rpath", NULL) == -1)
> @@ -154,8 +155,12 @@ main(int argc, char *argv[])
>   if (argc) {
>   for (i = 0; *argv; i++) {
>   tf[i].fname = *argv++;
> - if ((tf[i].fp = fopen(tf[i].fname, "r")) == NULL ||
> - fstat(fileno(tf[i].fp), &(tf[i].sb))) {
> + /*
> +  * Use O_NONBLOCK to avoid hanging on FIFO.
> +  */
> + fd = open(tf[i].fname, O_RDONLY | O_NONBLOCK);
> + if (fd == -1 || (tf[i].fp = fdopen(fd, "r")) == NULL ||
> + fstat(fd, &(tf[i].sb))) {
>   ierr(tf[i].fname);
>   i--;
>   continue;
> 



Re: Cron running at 99% CPU for seemingly no reason

2022-05-15 Thread Claudio Jeker
On Sun, May 15, 2022 at 12:06:33PM +0200, Stephan Mending wrote:
> Hi *, 
> I've got a system running -current that keeps crashing on me every couple of 
> days. 
> Output of ddb: 
> 
> Connected to /dev/cuaU0 (speed 115200)
> 
> ddb{0}> show panic
> the kernel did not panic
> ddb{0}> show uvm
> Current UVM status:
>   pagesize=4096 (0x1000), pagemask=0xfff, pageshift=12
>   482451 VM pages: 43158 active, 132795 inactive, 35 wired, 192336 free 
> (24054 z
> ero)
>   min  10% (25) anon, 10% (25) vnode, 5% (12) vtext
>   freemin=16081, free-target=21441, inactive-target=0, wired-max=160817
>   faults=2487210, traps=2404140, intrs=211883, ctxswitch=1960560 fpuswitch=0
>   softint=3499069, syscalls=2015497, kmapent=9
>   fault counts:
> noram=0, noanon=0, noamap=0, pgwait=0, pgrele=0
> ok relocks(total)=192470(193514), anget(retries)=603205(0), 
> amapcopy=177151
> 
> neighbor anon/obj pg=82033/639788, gets(lock/unlock)=415897/193548
> cases: anon=570367, anoncow=32838, obj=347149, prcopy=67670, 
> przero=1469152
> 
>   daemon and swap counts:
> woke=0, revs=0, scans=0, obscans=0, anscans=0
> busy=0, freed=0, reactivate=0, deactivate=0
> pageouts=0, pending=0, nswget=0
> nswapdev=1
> swpages=526020, swpginuse=0, swpgonly=0 paging=0
>   kernel pointers:
> objs(kern)=0x8238a038
> ddb{0}> show trace
> No such command
> ddb{0}> trace
> icmp_mtudisc_timeout(fd807a50b070,0) at icmp_mtudisc_timeout+0x77
> rt_timer_timer(8235d668) at rt_timer_timer+0x1cc
> softclock_thread(8000f260) at softclock_thread+0x13b
> end trace frame: 0x0, count: -3
> ddb{0}> 
> 
> Output of a second crash: 
> 
> ddb{0}> show panic
> the kernel did not panic
> ddb{0}> trace
> icmp_mtudisc_timeout(fd8069f9f700,0) at icmp_mtudisc_timeout+0x77
> rt_timer_timer(8231bfc8) at rt_timer_timer+0x1cc
> softclock_thread(8000f500) at softclock_thread+0x13b
> end trace frame: 0x0, count: -3
> ddb{0}> show uvm
> Current UVM status:
>   pagesize=4096 (0x1000), pagemask=0xfff, pageshift=12
>   482457 VM pages: 29240 active, 133535 inactive, 35 wired, 205028 free 
> (25630 z
> ero)
>   min  10% (25) anon, 10% (25) vnode, 5% (12) vtext
>   freemin=16081, free-target=21441, inactive-target=0, wired-max=160819
>   faults=687274, traps=693441, intrs=75204, ctxswitch=381252 fpuswitch=0
>   softint=615411, syscalls=607703, kmapent=9
>   fault counts:
> noram=0, noanon=0, noamap=0, pgwait=0, pgrele=0
> ok relocks(total)=185433(186477), anget(retries)=141598(0), amapcopy=75047
> neighbor anon/obj pg=69895/201703, gets(lock/unlock)=256502/186509
> cases: anon=114948, anoncow=26650, obj=237702, prcopy=17724, przero=290216
>   daemon and swap counts:
> woke=0, revs=0, scans=0, obscans=0, anscans=0
> busy=0, freed=0, reactivate=0, deactivate=0
> pageouts=0, pending=0, nswget=0
> nswapdev=1
> swpages=526020, swpginuse=0, swpgonly=0 paging=0
>   kernel pointers:
> objs(kern)=0x82317458
> ddb{0}> show bcstats
> Current Buffer Cache status:
> numbufs 24114 busymapped 0, delwri 5
> kvaslots 6030 avail kva slots 6030
> bufpages 96426, dmapages 96426, dirtypages 20
> pendingreads 0, pendingwrites 0
> highflips 0, highflops 0, dmaflips 0
> ddb{0}> mount
> No such command
> ddb{0}> trace
> icmp_mtudisc_timeout(fd8069f9f700,0) at icmp_mtudisc_timeout+0x77
> rt_timer_timer(8231bfc8) at rt_timer_timer+0x1cc
> softclock_thread(8000f500) at softclock_thread+0x13b
> end trace frame: 0x0, count: -3
> 
> 
> 
> Especially the line stating "the kernel did not panic" surprises me, as I am 
> greeted by the kernel debugger. Not sure how to interpret that.
> While looking for the reason behind these "crashes", I noticed that cron is 
> constantly running at 99% cpu. 
> 
> As a first measure I commented out all cronjobs in place (except for daily 
> weekly monthly as I figured these shouldnt
> pose a problem). But that did not remedy the problem. Right after startup 
> cron starts eating away at the cpu. Does 
> anybody have an idea how to further analyze the issue (apart from giving it a 
> go by recompiling cron and using gdb) ? 
> 

Also for cron, please attach ktrace to the cron process for a few seconds
and look at the kdump of that. Most probably it is constantly woken up for
some reasons.

-- 
:wq Claudio



Re: Cron running at 99% CPU for seemingly no reason

2022-05-15 Thread Claudio Jeker
On Sun, May 15, 2022 at 12:06:33PM +0200, Stephan Mending wrote:
> Hi *, 
> I've got a system running -current that keeps crashing on me every couple of 
> days. 
> Output of ddb: 
> 
> Connected to /dev/cuaU0 (speed 115200)
> 
> ddb{0}> show panic
> the kernel did not panic
> ddb{0}> show uvm
> Current UVM status:
>   pagesize=4096 (0x1000), pagemask=0xfff, pageshift=12
>   482451 VM pages: 43158 active, 132795 inactive, 35 wired, 192336 free 
> (24054 z
> ero)
>   min  10% (25) anon, 10% (25) vnode, 5% (12) vtext
>   freemin=16081, free-target=21441, inactive-target=0, wired-max=160817
>   faults=2487210, traps=2404140, intrs=211883, ctxswitch=1960560 fpuswitch=0
>   softint=3499069, syscalls=2015497, kmapent=9
>   fault counts:
> noram=0, noanon=0, noamap=0, pgwait=0, pgrele=0
> ok relocks(total)=192470(193514), anget(retries)=603205(0), 
> amapcopy=177151
> 
> neighbor anon/obj pg=82033/639788, gets(lock/unlock)=415897/193548
> cases: anon=570367, anoncow=32838, obj=347149, prcopy=67670, 
> przero=1469152
> 
>   daemon and swap counts:
> woke=0, revs=0, scans=0, obscans=0, anscans=0
> busy=0, freed=0, reactivate=0, deactivate=0
> pageouts=0, pending=0, nswget=0
> nswapdev=1
> swpages=526020, swpginuse=0, swpgonly=0 paging=0
>   kernel pointers:
> objs(kern)=0x8238a038
> ddb{0}> show trace
> No such command
> ddb{0}> trace
> icmp_mtudisc_timeout(fd807a50b070,0) at icmp_mtudisc_timeout+0x77
> rt_timer_timer(8235d668) at rt_timer_timer+0x1cc
> softclock_thread(8000f260) at softclock_thread+0x13b
> end trace frame: 0x0, count: -3
> ddb{0}> 
> 

This looks like some bad memory access. This is a fault and not really a
panic this is why 'show panic' returns 'the kernel did not panic'.

The crash in icmp_mtudisc_timeout() points to some error in the rttimer
code refactor I made. Please try a newer snapshot and include the dmesg.

If it happens again a call to db_show_rtentry in ddb may help better
understand what is going on:

call db_show_rtentry(fd807a50b070, 0, 0)
Where the first argument is derived from the first argument of
icmp_mtudisc_timeout from the trace.

-- 
:wq Claudio



Re: OpenBGPd: fatal in RDE: aspath_get: Cannot allocate memory

2022-04-04 Thread Claudio Jeker
On Tue, Mar 29, 2022 at 09:53:56AM +0200, Laurent CARON wrote:
> Hi,
> 
> I'm happily running several OpenBGPd routers (Openbsd 7.0).
> 
> After having applied the folloxing filters (to blackhole traffic from
> certain countries):
> 
> include "/etc/bgpd/deny-asn.ru.bgpd"
> include "/etc/bgpd/deny-asn.by.bgpd"
> include "/etc/bgpd/deny-asn.ua.bgpd"
> 
> 
> # head /etc/bgpd/deny-asn.ru.bgpd
> match from any AS 2148 set { localpref 250 nexthop blackhole }
> match from any AS 2585 set { localpref 250 nexthop blackhole }
> match from any AS 2587 set { localpref 250 nexthop blackhole }
> match from any AS 2599 set { localpref 250 nexthop blackhole }
> match from any AS 2766 set { localpref 250 nexthop blackhole }
> match from any AS 2848 set { localpref 250 nexthop blackhole }
> match from any AS 2854 set { localpref 250 nexthop blackhole }
> match from any AS 2875 set { localpref 250 nexthop blackhole }
> match from any AS 2878 set { localpref 250 nexthop blackhole }
> match from any AS 2895 set { localpref 250 nexthop blackhole }
> 

You should really use as-set for this:

as-set ru-set { 2148 2585 2587 ... }

And also not match any (at least I think you don't really want that to
match on ibgp sessions):

match from ebgp AS as-set ru-set set { localpref 250 nexthop blackhole }

If done right you can replace all your rules by one single one.

-- 
:wq Claudio



Re: OpenBGPd: fatal in RDE: aspath_get: Cannot allocate memory

2022-04-04 Thread Claudio Jeker
On Mon, Apr 04, 2022 at 03:14:35PM +0200, Laurent CARON wrote:
> 
> Le 01/04/2022 à 14:38, Claudio Jeker a écrit :
> > 
> > The numbers look reasonable with maybe the exception of prefix and BGP
> > path attrs. Unless this system is pushing or pulling lots of full feeds to
> > peers I would not expect such a high number of prefixes. Also the number
> > of path attributes is high but that could again be reasonable if many
> > different full feeds are involved.
> 
> Hi Claudio,
> 
> This box is terminating 3 full IPv4 + 3 full IPv6 feeds + a few dozen IX
> sessions in addition to 5 IPv4 + 5 IPv6 iBGP connections.

3G is not enough for such a busy system. You need to increase your limit,
5GB is probably enough.
 
> > > I'm not sure why the processes gets killed at around 3GB. Feels like you
> > > hit the ulimit. See Stuart's mail about how to look into that.
> > > So looking at this output I feel like you somehow created a BGP update
> > > loop where one or more systems are constantly sending UPDATEs to each
> > > other because the moment the update is processed the route decision
> > > changes and flaps back resulting in a withdraw or update.
> 
> I sincerely think it is not related to a BGP update loop because the issue
> is only triggered when adding the following filters:
> 
> include "/etc/bgpd/deny-asn.ru.bgpd"
> include "/etc/bgpd/deny-asn.by.bgpd"
> include "/etc/bgpd/deny-asn.ua.bgpd"
> 
> for a total of 8265 rules
> 
> I'll try to dig further.

If you deny asns then please use an as-set instead of individual rules.

-- 
:wq Claudio



Re: OpenBGPd: fatal in RDE: aspath_get: Cannot allocate memory

2022-04-01 Thread Claudio Jeker
On Thu, Mar 31, 2022 at 09:06:05PM +0200, Laurent CARON wrote:
> Le 29/03/2022 à 12:10, Claudio Jeker a écrit :
> > I doubt it is the filters. You run into some sort of memory leak. Please
> > monitor 'bgpctl show rib mem' output. Also check ps aux | grep bgpd output
> > to see why and when the memory starts to go up.
> > With that information it may be possible to figure out where this leak
> > sits and how to fix it.
> > 
> > Cheers
> 
> 
> Hi Claudio,
> 
> Please find the output of 'bgpctl show rib mem' just 1 minute before the
> crash:
> 
> cat 2022-03-30::15:07:01.mem
> RDE memory statistics
> 909685 IPv4 unicast network entries using 34.7M of memory
> 272248 IPv6 unicast network entries using 14.5M of memory
>2363169 rib entries using 144M of memory
>   14616410 prefix entries using 1.7G of memory
>1539060 BGP path attribute entries using 106M of memory
>and holding 14616410 references
> 635275 BGP AS-PATH attribute entries using 33.7M of memory
>and holding 1539060 references
>  47399 entries for 681150 BGP communities using 15.1M of memory
>and holding 14616410 references
>  22139 BGP attributes entries using 865K of memory
>and holding 3436885 references
>  22138 BGP attributes using 175K of memory
> 270121 as-set elements in 249193 tables using 9.7M of memory
> 452138 prefix-set elements using 19.0M of memory
> RIB using 2.1G of memory
> Sets using 28.7M of memory
> 
> RDE hash statistics
> path hash: size 131072, 1539060 entries
> min 0 max 31 avg/std-dev = 11.742/3.623
> aspath hash: size 131072, 635275 entries
> min 0 max 16 avg/std-dev = 4.847/2.123
> comm hash: size 16384, 47399 entries
> min 0 max 12 avg/std-dev = 2.893/1.622
> attr hash: size 16384, 22139 entries
> min 0 max 8 avg/std-dev = 1.351/1.084

The numbers look reasonable with maybe the exception of prefix and BGP
path attrs. Unless this system is pushing or pulling lots of full feeds to
peers I would not expect such a high number of prefixes. Also the number
of path attributes is high but that could again be reasonable if many
different full feeds are involved.
 
> Here is the output of 'ps aux | grep bgp' one minute before the crash:
> 
> _bgpd25479 100.1 40.1 33547416 33620192 ??  Rp/2   Tue09AM 1755:38.49
> bgpd: route
> _bgpd 8696 31.6  0.0 15800 13240 ??  Sp Tue09AM  626:35.66 bgpd:
> sessio
> _bgpd46603  0.0  0.0 22728 25876 ??  Ip Tue09AM1:29.11 bgpd: rtr
> en
> root 94644  0.0  0.0   196   916 ??  Rp/33:07PM0:00.00 grep bgpd
 
Interesting, the size is around 3GB which is somewhat reasonable.
What surprises me is the high CPU load and time spent in both the RDE and
SE. One of my core routers running since last September has about the same
CPU usage that your box collected in a few days. It seems that there is a
lot of churn.

I'm not sure why the processes gets killed at around 3GB. Feels like you
hit the ulimit. See Stuart's mail about how to look into that.
 
So looking at this output I feel like you somehow created a BGP update
loop where one or more systems are constantly sending UPDATEs to each
other because the moment the update is processed the route decision
changes and flaps back resulting in a withdraw or update.

You can check the 'bgpctl show' and 'bgpctl show nei ' output to
see between which peers many messages are sent. From there on you need to
see which prefixes cause this update storm. Probably some filter rule
causes this.

My assumption is that because of this UPDATE loop the systems slowly kill
each other by pushing more and more updates into various buffers along the
way.

> During the crash, bgpctl show rib mem doesn't work.
> Here is the ps aux | grep bgp output during the crash:
> 
> _bgpd25479  0.0  0.0 0 0 ??  Zp -  0:00.00 (bgpd)
> _bgpd46603  0.0  0.0 0 0 ??  Zp -  0:00.00 (bgpd)
> _bgpd 8696  0.0  0.0 0 0 ??  Zp -  0:00.00 (bgpd)
> root 76428  0.0  0.0   180   772 ??  R/2 3:08PM0:00.00 grep bgpd
> 
> 
> Please note /var/log/messages output:
> 
> Mar 30 15:07:27 bgpgw-004 bgpd[17103]: peer closed imsg connection
> Mar 30 15:07:27 bgpgw-004 bgpd[17103]: main: Lost connection to RDE
> Mar 30 15:07:27 bgpgw-004 bgpd[46603]: peer closed imsg connection
> Mar 30 15:07:27 bgpgw-004 bgpd[46603]: RTR: Lost connection to RDE
> Mar 30 15:07:27 bgpgw-004 bgpd[46603]: peer closed imsg connection
> Mar 30 15:07:27 bgpgw-004 bgpd[46603]: fatal in RTR: Lost connection to
> parent
> Mar 30 15:07:27 bgpgw-004 bgpd[8696]: peer closed imsg connection
> Mar 30 15:07:27 bgpgw-004 bgpd[8696]: SE:

Re: OpenBGPd: fatal in RDE: aspath_get: Cannot allocate memory

2022-03-29 Thread Claudio Jeker
On Tue, Mar 29, 2022 at 09:53:56AM +0200, Laurent CARON wrote:
> Hi,
> 
> I'm happily running several OpenBGPd routers (Openbsd 7.0).
> 
> After having applied the folloxing filters (to blackhole traffic from
> certain countries):
> 
> include "/etc/bgpd/deny-asn.ru.bgpd"
> include "/etc/bgpd/deny-asn.by.bgpd"
> include "/etc/bgpd/deny-asn.ua.bgpd"
> 
> 
> # head /etc/bgpd/deny-asn.ru.bgpd
> match from any AS 2148 set { localpref 250 nexthop blackhole }
> match from any AS 2585 set { localpref 250 nexthop blackhole }
> match from any AS 2587 set { localpref 250 nexthop blackhole }
> match from any AS 2599 set { localpref 250 nexthop blackhole }
> match from any AS 2766 set { localpref 250 nexthop blackhole }
> match from any AS 2848 set { localpref 250 nexthop blackhole }
> match from any AS 2854 set { localpref 250 nexthop blackhole }
> match from any AS 2875 set { localpref 250 nexthop blackhole }
> match from any AS 2878 set { localpref 250 nexthop blackhole }
> match from any AS 2895 set { localpref 250 nexthop blackhole }
> 
> The bgpd daemon crashes every few days with the following:
> 
> Mar 21 11:36:54 bgpgw-004 bgpd[76476]: 338 roa-set entries expired
> Mar 21 12:06:54 bgpgw-004 bgpd[76476]: 36 roa-set entries expired
> Mar 21 12:11:54 bgpgw-004 bgpd[76476]: 82 roa-set entries expired
> Mar 21 12:22:36 bgpgw-004 bgpd[99215]: fatal in RDE: prefix_alloc: Cannot
> allocate memory
> Mar 21 12:22:36 bgpgw-004 bgpd[65049]: peer closed imsg connection
> Mar 21 12:22:36 bgpgw-004 bgpd[65049]: main: Lost connection to RDE
> Mar 21 12:22:36 bgpgw-004 bgpd[76476]: peer closed imsg connection
> Mar 21 12:22:36 bgpgw-004 bgpd[58155]: peer closed imsg connection
> Mar 21 12:22:36 bgpgw-004 bgpd[76476]: RTR: Lost connection to RDE
> Mar 21 12:22:36 bgpgw-004 bgpd[58155]: SE: Lost connection to RDE
> Mar 21 12:22:36 bgpgw-004 bgpd[58155]: peer closed imsg connection
> Mar 21 12:22:36 bgpgw-004 bgpd[76476]: peer closed imsg connection
> Mar 21 12:22:36 bgpgw-004 bgpd[58155]: SE: Lost connection to RDE control
> Mar 21 12:22:36 bgpgw-004 bgpd[76476]: fatal in RTR: Lost connection to
> parent
> Mar 21 12:22:36 bgpgw-004 bgpd[58155]: Can't send message 61 to RDE, pipe
> closed
> Mar 21 12:22:36 bgpgw-004 bgpd[58155]: peer closed imsg connection
> Mar 21 12:22:36 bgpgw-004 bgpd[58155]: SE: Lost connection to parent
> ...
> 
> Mar 24 06:34:17 bgpgw-004 bgpd[83062]: 17 roa-set entries expired
> Mar 24 06:54:47 bgpgw-004 bgpd[82782]: fatal in RDE: communities_copy:
> Cannot allocate memory
> Mar 24 06:54:47 bgpgw-004 bgpd[99753]: peer closed imsg connection
> Mar 24 06:54:47 bgpgw-004 bgpd[83062]: peer closed imsg connection
> Mar 24 06:54:47 bgpgw-004 bgpd[99753]: main: Lost connection to RDE
> Mar 24 06:54:47 bgpgw-004 bgpd[83062]: RTR: Lost connection to RDE
> Mar 24 06:54:47 bgpgw-004 bgpd[83062]: peer closed imsg connection
> Mar 24 06:54:47 bgpgw-004 bgpd[83062]: fatal in RTR: Lost connection to
> parent
> Mar 24 06:54:47 bgpgw-004 bgpd[40748]: peer closed imsg connection
> Mar 24 06:54:47 bgpgw-004 bgpd[40748]: SE: Lost connection to RDE
> Mar 24 06:54:47 bgpgw-004 bgpd[40748]: peer closed imsg connection
> Mar 24 06:54:47 bgpgw-004 bgpd[40748]: SE: Lost connection to RDE control
> Mar 24 06:54:47 bgpgw-004 bgpd[40748]: Can't send message 61 to RDE, pipe
> closed
> Mar 24 06:54:47 bgpgw-004 bgpd[40748]: peer closed imsg connection
> Mar 24 06:54:47 bgpgw-004 bgpd[40748]: SE: Lost connection to parent
> ...
> 
> Mar 27 13:07:56 bgpgw-004 bgpd[95001]: fatal in RDE: aspath_get: Cannot
> allocate memory
> Mar 27 13:07:56 bgpgw-004 bgpd[84816]: peer closed imsg connection
> Mar 27 13:07:56 bgpgw-004 bgpd[84816]: main: Lost connection to RDE
> Mar 27 13:07:56 bgpgw-004 bgpd[3118]: peer closed imsg connection
> Mar 27 13:07:56 bgpgw-004 bgpd[3118]: RTR: Lost connection to RDE
> Mar 27 13:07:56 bgpgw-004 bgpd[3118]: peer closed imsg connection
> Mar 27 13:07:56 bgpgw-004 bgpd[3118]: fatal in RTR: Lost connection to
> parent
> Mar 27 13:07:56 bgpgw-004 bgpd[60695]: peer closed imsg connection
> Mar 27 13:07:56 bgpgw-004 bgpd[60695]: SE: Lost connection to RDE
> Mar 27 13:07:56 bgpgw-004 bgpd[60695]: peer closed imsg connection
> Mar 27 13:07:56 bgpgw-004 bgpd[60695]: SE: Lost connection to RDE control
> Mar 27 13:07:56 bgpgw-004 bgpd[60695]: peer closed imsg connection
> Mar 27 13:07:56 bgpgw-004 bgpd[60695]: SE: Lost connection to parent
> 
> Is my filter too aggressive for bgpd ? Is there a more efficient way to
> write it ?
 
I doubt it is the filters. You run into some sort of memory leak. Please
monitor 'bgpctl show rib mem' output. Also check ps aux | grep bgpd output 
to see why and when the memory starts to go up.
With that information it may be possible to figure out where this leak
sits and how to fix it.

Cheers
-- 
:wq Claudio



Re: httpd HTTP/2 and HTTP/3 support

2022-01-03 Thread Claudio Jeker
On Fri, Dec 31, 2021 at 09:36:54AM -, Stuart Henderson wrote:
> On 2021-12-31, Georg Pfuetzenreuter  wrote:
> > Hi,
> > I searched but couldn't find any recent threads.
> > Does httpd support HTTP/2?
> 
> No.
> 
> > Is support for the upcoming HTTP/3 planned?
> 
> guessing but I think this would also be "no".

It is indeed not. HTTP/3 is "just" HTTP/2 over QUIC.

Implementing reliable communication over UDP in userland is a lot of work
and it is rather complicated. On top it is a violation of the OSI stack
and the socket(2) API. And then it needs to be reimplemented for every
application that wants to use it.

-- 
:wq Claudio



Re: Profiling ifconfig

2021-12-16 Thread Claudio Jeker
On Thu, Dec 16, 2021 at 03:55:43PM +0800, Vladimir Nikishkin wrote:
> Hello, everyone
> 
> Recently I had a problem: my system is losing network connectivity,
> although the interface (vio0 on KVM) seemed up. Restarting the
> connection with `ifconfig vio0 down` and `ifconfig vio0 up` restores the
> connection.
> 
> However, when I timed the execution, I found that the second `up` can
> take up to 15 minutes. (Hugely unexpected!) To find out where the
> program is waiting, I decided to recompile ifconfig from source with
> debugging and profiling support.
> 
> Slightly adjusting the commands provided by the Makefile, I came up with
> the following commands:
> 
> ```
> egcc -O0 -g -pg -fPIC  -Werror-implicit-function-declaration  -c ifconfig.c
> egcc -O0 -g -pg -fPIC  -Werror-implicit-function-declaration  -c brconfig.c
> egcc -O0 -g -pg -fPIC  -Werror-implicit-function-declaration  -c sff.c
> egcc  -g -pg -shared -pie -o ifconfig ifconfig.o brconfig.o sff.o -lc -pg
> ```
> 
> However, when I run ./ifconfig compiled like this, I am getting (besides
> the output of ifconfig itself) the following error message:
> 
> ```
> gmon.out: No such file or directory
> ```
> 
> I find this unexpected. Compiling and linking a simple helloworld with
> -pg -g seems to be working fine, and gmon.out is produced as expected.
> 
> What am I doing wrong? Is there something specific that needs to be
> permitted to profile ifconfig?

I doubt the problem is in ifconfig(8) itself but more an ioctl that takes
long to finish. Anyway for prfiling to work you need to neuter unveil() in
ifconfig. e.g. by changing the code. With unveil on the gmon.out file
written in the atexit handler can't be created.

-- 
:wq Claudio



Re: bgpd, announce to ibgp from 2 routers, prefixes only show up from 1

2021-11-30 Thread Claudio Jeker
On Mon, Nov 29, 2021 at 10:38:21PM +0100, Sebastian Benoit wrote:
> Stuart Henderson(s...@spacehopper.org) on 2021.11.13 00:11:08 +:
> > I have a pair of -current routers running bgpd (let's call them rtr-a
> > and rtr-b) on a subnet which also has some vpn gateways and firewalls.
> > 
> > These routers provide a carp address which the vpn gateways are using
> > as default route. There are some networks behind the vpn gateways (a
> > /32 to accept incoming vpn connections and some other prefixes that vpn
> > clients are numbered from).
> > 
> > rtr-a and rtr-b have static routes to those networks, and they have
> > network statements in bgpd.conf to announce them to their ibgp peers
> > ("network 172.24.232.0/21 set nexthop XXX" etc) so the paths are reachable
> > from the rest of the network. (This is replacing an existing setup using
> > ospf, trying to remove routing protocols from machines that don't really
> > need them).
> > 
> > It is working but something seems a little odd - the paths are announced
> > from both routers briefly and show up on the rest of the network from
> > both rtr-a and rtr-b. But after a few seconds, rtr-b receives these
> > paths from rtr-a, and then rtr-b stops announcing them itself. (they
> > stop showing in "bgpctl sh rib out" on rtr-b; "bgpctl sh nex" does
> > correctly identify the associated nexthops as connected/UP).
> > 
> > Is this expected/correct behaviour?
> 
> It is expected: once rtr-b receives the route from rtr-a, it will run the
> route decision process on it. IF both routers are configured identically
> except for the router-id, one of the routes will be prefered at either the
> "oldest path" or the "lowest bgp id" criteria.
> 
> As only one route is a best route, that one will be annouced to the
> neighbors. However this is IBGP. In a set of IBGP connected routers, a
> router will not announce a route to other IBGP peers that it received from
> on a IBGP session. Thus, rtr-b will stop announcing that route.
> 
> When rtr-a goes down, the session is shut down or the prefix is filtered,
> bgpd wont see the "better" route anymore and announce its own instead.
> 
> > I'd prefer to have them announced from both rtr-a and rtr-b, so there's
> > no blackhole period if rtr-a is restarted while rtr-b figures out that
> > it should start announcing them, etc. (No need for tracking carp state
> > in this case, I'm not using stateful pf rules on the traffic involved).
> 
> This is a place where ospf might give you faster failover, especiall y with
> the redistribute ... depend on ... syntax.
>  
> > If rtr-b stops seeing the prefixes from rtr-a (either by taking down
> > the ibgp session, or by filtering) I see the announcements from both
> > rtr-a and rtr-b again. So the obvious workaround is to filter, but
> > I thought I'd ask first in case it's something that is better handled
> > by code changes rather than config.

Or the other way is to alter localpref, as-path or metric of those routes
in some way that makes sure that both router-A and router-B announce a
"better" route.

You can do this in multiple ways. One way would be to use something like
this:
pass out on ibgp metric +1
or
pass in on ibgp metric -1
 
Long term it would be nice to reintroduce route metrics and use this
to sort nexthops in bgpd.

-- 
:wq Claudio



Re: Put non-NULL pledge abort in the man page

2021-11-25 Thread Claudio Jeker
On Thu, Nov 25, 2021 at 04:55:23AM -0600, Luke Small wrote:
> I ran ktrace. Kdump said the last thing it did was try to load
> /usr/libexec/ld.so
> 
> To main(), before the unveil pledge is dropped, I added:
> 
> if (unveil("/usr/libexec/", "rx") == -1)
> err(1, "unveil, line: % d", __LINE__);
> 
> After running it again, it spits out an error message:
> 
> ld.so: pkg_ping: can't load library 'libc.so.96.1'
> 
> So I put in:
> 
> if (unveil("usr/lib/", "rx") == -1)
> err(1, "unveil, line: %d", __LINE__);
> 
> Now it successfully execv()s into the new process space!
> Now in the newly created program, which hasn’t set new pledge execpromises,
> it won’t successfully run ftp(1) because it wasn’t granted the inet
> execpromise.
> 
> execpromises seems to have carried over!

Don't use execpromises. That feature is not working and no tool in OpenBSD
uses it.

-- 
:wq Claudio



Re: Dynamic routing and REJECT,LLINFO,CLONED routes

2021-11-07 Thread Claudio Jeker
On Sun, Nov 07, 2021 at 12:46:43PM +0100, Denis Fondras wrote:
> I came up with this diff to overcome my problem.
> 
> Index: rtable.c
> ===
> RCS file: /cvs/src/sys/net/rtable.c,v
> retrieving revision 1.75
> diff -u -p -r1.75 rtable.c
> --- rtable.c  25 May 2021 22:45:09 -  1.75
> +++ rtable.c  7 Nov 2021 11:21:33 -
> @@ -834,6 +834,10 @@ rtable_mpath_insert(struct art_node *an,
>   return;
>   }
>  
> + /* Unreachable on-link route will not preferred */
> + if (ISSET(mrt->rt_flags, RTF_LLINFO|RTF_REJECT))
> + prio = 0;
> +
>   /* Iterate until we find the route to be placed after ``rt''. */
>   while (mrt->rt_priority <= prio && SRPL_NEXT_LOCKED(mrt, rt_next)) {
>   prt = mrt;
> 
> Le Sun, Nov 07, 2021 at 10:11:54AM +0100, Denis Fondras a écrit :
> > Hi,
> > 
> > I am using BGP to connect 2 OpenBSD-current routers :
> > 
> > [static default GW]---RT1---[bgp]---RT2
> > 
> > I announce an IPv4 /32 from RT2.
> > After I start both RT1 and RT2, traffic flows to RT2 /32 without any issue.
> > However if I reboot RT2 (let's say for sysupgrade), RT1 loses the /32 
> > (which is
> > expected) but as traffic is still directed to the /32 (because of a constant
> > ping towards the /32 for example), RT1 installs a route for the /32 with 
> > these
> > flags :
> > 
> > flags: 
> > (The REJECT flag is dropped after a timeout but comes back a few second 
> > later)
> > 
> > From there I cannot get the back /32 from BGP until I manually delete the
> > automatically installed HOST route. Is there any way to deal with it without
> > manual intervention ?
> > 
> > Denis

To be honest, you have arp or ND running on that prefix and then overload
it with a /32 route. You really need to explain why you do that. This is
in my opinion a broken setup.

We don't want to add hacks for setups that are inherently broken. If
something is directly connected it should use that direct link.

-- 
:wq Claudio



Re: Asyncronous IO

2021-11-04 Thread Claudio Jeker
On Wed, Nov 03, 2021 at 03:37:01PM +, cho...@jtan.com wrote:
> I program on OpenBSD and am writing a library which presents an API
> for IO. POSIX defines an API[*] for asyncronous IO and I would like
> my code to support it but this API is unavailable in OpenBSD.
> 
> Is the lack intentional (perhaps there are other plans) or is it
> simply the case that no-one has sat down and written it yet?

A bit of both. AIO is not often used. Using basic poll/select or
the use of libevent is much preferred to build an async API.
AIO is complex and in most cases not needed.
 
> I don't mind that the async parts will not (yet) work on OpenBSD
> because I can always test them elsewhere but I would like to know
> which backend API(s) I should write against and therefore what
> OpenBSD intends to do regarding AIO in the future.
> 
> Cheers,
> 
> Matthew
> 
> [*] https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/aio.h.html
> 

-- 
:wq Claudio



Re: httpd(8) - Internal Server error (500) on invalid request

2021-10-21 Thread Claudio Jeker
On Thu, Oct 21, 2021 at 04:38:43PM +0200, Sebastian Benoit wrote:
> J. K.(openbsd.l...@krottmayer.com) on 2021.10.21 14:10:16 +0200:
> > Another question, to httpd(8). Tried the following query.
> > Used an invalid HTTP Version number (typo).
> > 
> > $ telnet 10.42.42.183 80
> > [Shortened]
> > GET / HTTP/1.2
> > [content]
> > 
> > httpd provide here the site. Without checking the not existent version
> > (1.2) number and the Host. Okay, that's maybe stupid from me to
> > start a request with an invalid version number. But should not also
> > the server answer with 400 (bad request)?
> > 
> > According to the source only HTTP/1.1 is checked. All other request
> > will be accepted. Okay, I'm not a RFC specialist. Still a newbie.
> 
> This diff makes httpd return "505 HTTP Version Not Supported"
> for < 0.9 and > 1.9 http versions. Anything from 1.1 to 1.9 is
> interpreted as 1.1. This is what nginx does too.
> 
> ok?
> 
> diff --git usr.sbin/httpd/server_http.c usr.sbin/httpd/server_http.c
> index 6a74f3e45c5..52aaf3711c2 100644
> --- usr.sbin/httpd/server_http.c
> +++ usr.sbin/httpd/server_http.c
> @@ -51,6 +51,7 @@ int  server_http_authenticate(struct server_config 
> *,
>   struct client *);
>  char *server_expand_http(struct client *, const char *,
>   char *, size_t);
> +int   http_version_num(char *);
>  
>  static struct http_method http_methods[] = HTTP_METHODS;
>  static struct http_error  http_errors[] = HTTP_ERRORS;
> @@ -198,6 +199,19 @@ done:
>   return (ret);
>  }
>  
> +int http_version_num(char *version)

KNF please.

> +{
> + if (strcmp(version, "HTTP/0.9") == 0)
> + return (9);
> + if (strcmp(version, "HTTP/1.0") == 0)
> + return (10);
> + /* any other version 1.x gets downgraded to 1.1 */
> + if (strncmp(version, "HTTP/1", 6) == 0)
> + return (11);
> +
> + return (0);
> +}
> +
>  void
>  server_read_http(struct bufferevent *bev, void *arg)
>  {
> @@ -207,6 +221,7 @@ server_read_http(struct bufferevent *bev, void *arg)
>   char*line = NULL, *key, *value;
>   const char  *errstr;
>   size_t   size, linelen;
> + int  version;
>   struct kv   *hdr = NULL;
>  
>   getmonotime(>clt_tv_last);
> @@ -329,12 +344,29 @@ server_read_http(struct bufferevent *bev, void *arg)
>   *desc->http_query++ = '\0';
>  
>   /*
> -  * Have to allocate the strings because they could
> +  * We have to allocate the strings because they could
>* be changed independently by the filters later.
> +  * Allow HTTP version 0.9 to 1.1.
> +  * Downgrade http version > 1.1 <= 1.9 to version 1.1.
> +  * Return HTTP Version Not Supported for anything else.
>*/
> - if ((desc->http_version =
> - strdup(desc->http_version)) == NULL)
> - goto fail;
> +
> + version = http_version_num(desc->http_version);

I woud prefer if this code would store the version not in
desc->http_version until after the strdup(). The way these strdup work is
just wonky. Especil in the failure cases this may result in calling free
on the wrong thing.

> + if (version == 11) {
> + if ((desc->http_version =
> + strdup("HTTP/1.1")) == NULL)
> + goto fail;
> + } else {
> + if ((desc->http_version =
> + strdup(desc->http_version)) == NULL)
> + goto fail;
> + }
> +
> + if (version == 0) {
> + server_abort_http(clt, 505, "bad http version");
> + goto abort;
> + }

I would prefer to have this as:
if (version == 0) {
} else if if (version == 11) {
} else {
}

-- 
:wq Claudio



Re: httpd(8) - Internal Server error (500) on invalid request

2021-10-21 Thread Claudio Jeker
On Thu, Oct 21, 2021 at 01:21:33PM +0200, Sebastian Benoit wrote:
> J. K.(openbsd.l...@krottmayer.com) on 2021.10.21 11:55:47 +0200:
> > Hi,
> > 
> > I don't know if this is a real issue from OpenBSD's httpd(8).
> > Tried some requests to httpd(8) for the purpose of education.
> > 
> > Simple tried the following request:
> > 
> > $ telnet 10.42.42.183 80
> > Trying 10.42.42.183...
> > Connected to 10.42.42.183.
> > Escape character is '^]'.
> > GET / HTTP/1.1
> > fasfsdfsfd
> > 
> > Here without the colon httpd(8) return an internal server
> > error.
> > 
> > Can somebody verify this behavior?
> > 
> > Noticed with OpenBSD 7.0. Is this a correct behavior (RFC
> > conform)?
> > 
> > Thanks in advance!
> > 
> > Kind regrads,
> > 
> > J. K.
> 
> Hi,
> 
> yes. The server should probably answer with a "Bad Request" instead.
> 
> Fix below. ok?

OK claudio@
 
> diff --git usr.sbin/httpd/server_http.c usr.sbin/httpd/server_http.c
> index 732add41283..fce3c21af72 100644
> --- usr.sbin/httpd/server_http.c
> +++ usr.sbin/httpd/server_http.c
> @@ -268,8 +268,14 @@ server_read_http(struct bufferevent *bev, void *arg)
>   else if (*key == ' ' || *key == '\t')
>   /* Multiline headers wrap with a space or tab */
>   value = NULL;
> - else
> + else {
> + /* Not a multiline header, should have a : */
>   value = strchr(key, ':');
> + if (value == NULL) {
> + server_abort_http(clt, 400, "malformed");
> + goto abort;
> + }
> + }
>   if (value == NULL) {
>   if (clt->clt_line == 1) {
>   server_abort_http(clt, 400, "malformed");
> 

-- 
:wq Claudio



Re: problems with outbound load-balancing (PF sticky-address for destination IPs)

2021-09-29 Thread Claudio Jeker
On Wed, Sep 29, 2021 at 08:07:43PM +1000, Andrew Lemin wrote:
> Hi Claudio,
> 
> So you probably guessed I am using 'route-to { GW1, GW2, GW3, GW4 } random'
> (and was wanting to add 'sticky-address' to this) based on your reply :)
> 
> "it will make sure that selected default routes are sticky to source/dest
> pairs" - Are you saying that even though multipath routing uses hashing to
> select the path (https://www.ietf.org/rfc/rfc2992.txt - "The router first
> selects a key by performing a hash (e.g., CRC16) over the packet header
> fields that identify a flow."), subsequent new sessions to the same dest IP
> with different source ports will still get the same path? I thought a new
> session with a new tuple to the same dest IP would get a different hashed
> path with multipath?

OpenBSD multipath routing implements gateway selection by Hash-Threshold
from RFC 2992. It therefor routes the same src/dst pair over the same
nexthop as long as there are no changes to the route. If one of your
links drops then some sessions will move links but the goal of
hash-threshold is to minimize the affected session.

> "On rerouting the multipath code reshuffles the selected routes in a way to
> minimize the affected sessions." - Are you saying, in the case where one
> path goes down, it will migrate all the entries only for that failed path
> onto the remaining good paths (like ecmp-fast-reroute ?)

No, some session on good paths may also migrate to other links, this is
how the hash-threshold algorithm works.

Split with 4 nexthops, now lets assume link 2 dies and stuff gets
reshuffled:
+=+=+=+=+
|   link   1  |   link   2  |   link   3  |   link   4  |
+=+=+===+===+=+=+
|   link   1|   link   3|   link   4|
+===+
Unaffected sessions for drop
 ^   ^^^   ^
Affected sessions because of drop
   # #
Unsing other ways to split the hash into buckets (e.g. a simple modulo)
causes more change.

Btw. using route-to with 4 gw will not detect a link failure and 25% of
your traffic will be dropped. This is another advantage of multipath
routing.

Cheers
-- 
:wq Claudio

> Thanks for your time, Andy.
> 
> On Wed, Sep 29, 2021 at 5:21 PM Claudio Jeker 
> wrote:
> 
> > On Wed, Sep 29, 2021 at 02:17:59PM +1000, Andrew Lemin wrote:
> > > I see this question died on its arse! :)
> > >
> > > This is still an issue for outbound load-balancing over multiple internet
> > > links.
> > >
> > > PF's 'sticky-address' parameter only works on source IPs (because it was
> > > originally designed for use when hosting your own server pools - inbound
> > > load balancing).
> > > I.e. There is no way to configure 'sticky-address' to consider
> > destination
> > > IPs for outbound load balancing, so all subsequent outbound connections
> > to
> > > the same target IP originate from the same internet connection.
> > >
> > > The reason why this is desirable is because an increasing number of
> > > websites use single sign on mechanisms (quite a few different
> > architectures
> > > expose the issue described here). After a users outbound connection is
> > > initially randomly load balanced onto an internet connection, their
> > browser
> > > is redirected into opening multiple additional sockets towards the
> > > website's load balancers / cloud gateways, which redirect the connections
> > > to different internal servers for different parts of the site/page, and
> > the
> > > SSO authentication/cookies passed on the additional sockets must to
> > > originate from the same IP as the original socket. As a result outbound
> > > load-balancing does not work for these sites.
> > >
> > > The ideal functionality would be for 'sticky-address' to consider both
> > > source IP and destination IP after initially being load balanced by
> > > round-robin or random.
> >
> > Just use multipath routing, it will make sure that selected default routes
> > are sticky to source/dest pairs. You may want the states to be interface
> > bound if you need to nat-to on those links.
> >
> > On rerouting the multipath code reshuffles the selected routes in a way to
> > minimize the affected sessions. All this is done without any extra memory
> > usage since the hashing function is smart.
> >
> > --
> > :wq Claudio

Re: problems with outbound load-balancing (PF sticky-address for destination IPs)

2021-09-29 Thread Claudio Jeker
On Wed, Sep 29, 2021 at 02:17:59PM +1000, Andrew Lemin wrote:
> I see this question died on its arse! :)
> 
> This is still an issue for outbound load-balancing over multiple internet
> links.
> 
> PF's 'sticky-address' parameter only works on source IPs (because it was
> originally designed for use when hosting your own server pools - inbound
> load balancing).
> I.e. There is no way to configure 'sticky-address' to consider destination
> IPs for outbound load balancing, so all subsequent outbound connections to
> the same target IP originate from the same internet connection.
> 
> The reason why this is desirable is because an increasing number of
> websites use single sign on mechanisms (quite a few different architectures
> expose the issue described here). After a users outbound connection is
> initially randomly load balanced onto an internet connection, their browser
> is redirected into opening multiple additional sockets towards the
> website's load balancers / cloud gateways, which redirect the connections
> to different internal servers for different parts of the site/page, and the
> SSO authentication/cookies passed on the additional sockets must to
> originate from the same IP as the original socket. As a result outbound
> load-balancing does not work for these sites.
> 
> The ideal functionality would be for 'sticky-address' to consider both
> source IP and destination IP after initially being load balanced by
> round-robin or random.

Just use multipath routing, it will make sure that selected default routes
are sticky to source/dest pairs. You may want the states to be interface
bound if you need to nat-to on those links.

On rerouting the multipath code reshuffles the selected routes in a way to
minimize the affected sessions. All this is done without any extra memory
usage since the hashing function is smart.

-- 
:wq Claudio

 
> Thanks again, Andy.
> 
> On Sat, Apr 3, 2021 at 12:40 PM Andy Lemin  wrote:
> 
> > Hi smart people :)
> >
> > The current implementation of ‘sticky-address‘ relates only to a sticky
> > source IP.
> > https://www.openbsd.org/faq/pf/pools.html
> >
> > This is used for inbound server load balancing, by ensuring that all
> > socket connections from the same client/user/IP on the internet goes to the
> > same server on your local server pool.
> >
> > This works great for ensuring simplified memory management of session
> > artefacts on the application being hosted (the servers do not have to
> > synchronise the users session data as extra sockets from that user will
> > always connect to the same local server)
> >
> > However sticky-address does not have an equivalent for sticky destination
> > IPs. For example when doing outbound load balancing over multiple ISP
> > links, every single socket is load balanced randomly. This causes many
> > websites to break (especially cookie login and single-sign-on style
> > enterprise services), as the first outbound socket will originate randomly
> > from one of the local ISP IPs, and the users login session/SSO (on the
> > server side) will belong to that first random IP.
> >
> > When the user then browses to or uses another part of that same website
> > which requires additional sockets, the additional sockets will pass the SSO
> > credentials from the first socket, but the extra socket connection will
> > again be randomly load-balanced, and so the remote server will reject the
> > connection as it is originating from the wrong source IP etc.
> >
> > Therefore can I please propose a “sticky-address for destination IPs” as
> > an analogue to the existing sticky-address for source IPs?
> >
> > This is now such a problem that we have to use sticky-address even on
> > outbound load-balancing connections, which causes internal user1 to always
> > use the same ISP for _everthing_ etc. While this does stop the breakage, it
> > does not result in evenly distributed balancing of traffic, as users are
> > locked to one single transit, for all their web browsing for the rest of
> > the day after being randomly balanced once first-thing in the morning,
> > rather than all users balancing over all transits throughout the day.
> >
> > Another pain; using the current source-ip sticky-address for outbound
> > balancing, makes it hard to drain transits for maintenance. For example
> > without source sticky-address balancing, you can just remove the transit
> > from the Pf rule, and after some time, all traffic will eventually move
> > over to the other transits, allowing the first to be shut down for whatever
> > needs. But with the current source-ip sticky-address, that first transit
> > will take months to drain in a real-world situations..
> >
> > lastly just as a nice-to-have, how feasible would a deterministic load
> > balancing algorithm be? So that balancing selection is done based on the
> > “least utilised” path?
> >
> > Thanks for your time and consideration,
> > Kindest regards Andy
> >
> >
> >
> > Sent from a teeny tiny 

Re: Blog comparing open source BGP stacks

2021-08-25 Thread Claudio Jeker
On Wed, Aug 25, 2021 at 02:01:26PM +0200, Kristjan Komlosi wrote:
> On 24. 08. 21 21:59, Laura Smith wrote:
> > Would be interesting to hear comments from the community on this comparison 
> > : https://elegantnetwork.github.io/posts/followup-measuring-BGP-stacks/
> > 
> > N.B. For the record, don't shoot the messenger, I had nothing to do with 
> > these tests, I just became aware of them via the BIRD list.  I am 
> > particularly interested in the OpenBSD community comments given one person 
> > on the BIRD list had this to say of OpenBGPD: "OpenBGPd has always been a 
> > dog.".
> > 
> 
> I'm no expert at all, but I'd imagine that OpenBGPD performs at least
> somewhat differently on Linux, which seems to be what the author used in the
> tests. My personal BGP server runs OpenBSD on a 512MB VPS, using about 150MB
> of RAM with full IPv6 table and routing my traffic just fine, though I can
> imagine the tables turning very quickly with lots of neighbors, as the
> benchmark shows. I could try replicating their setup on an OpenBSD system,
> but I don't have good enough hardware at hand at the moment.

The massive amount of memory used in OpenBGPD comes from the fact that
unlike BIRD OpenBGPD runs with a full Adj-RIB-Out.
The tests result in large amount of prefixes that need to be tracked.
If you have 100 peers announcing 1 random prefixes then you end up with
100 * 100 * 1 = 100Million elements to manage. This is not a realistic
test since in most cases the number of routes in the Adj-RIB-Out is
limited (even on route servers). In the end for day to day use OpenBGPD
performs well enough for many people. Future releases will focus more on
performance and optimizing Adj-RIB-Out is on the list.

-- 
:wq Claudio



Re: WireGuard host crashes roughly every week

2021-08-04 Thread Claudio Jeker
On Wed, Aug 04, 2021 at 08:36:07PM +1000, Matt Dunwoodie wrote:
> On Tue, 3 Aug 2021 13:02:15 -0500
> "Matt P."  wrote:
> 
> > Hi Stuart!
> > 
> > Your advice lead me to discover, the issue happens only with the
> > "PersistantKeepalive = 25" option I had enabled on each wg-quick
> > peer. Looks like you could recreate it by making a few no-address
> > peers with this option enabled.
> 
> Hi Matt,
> 
> This insight was very helpful. It looks like mbufs are not freed if
> we're sending to a peer with no endpoint. Specifically, "wg_send" is
> expected to free the mbuf if there is an error sending. This (untested)
> patch should fix it.
> 
> Cheers,
> Matt
> 
> diff --git if_wg.c if_wg.c
> index 18333eda4cb..5f4319558ab 100644
> --- if_wg.c
> +++ if_wg.c
> @@ -810,6 +810,7 @@ wg_send(struct wg_softc *sc, struct wg_endpoint *e, 
> struct mbuf *m)
>   IPPROTO_IPV6);
>  #endif
>   } else {
> + m_freem(m);
>   return EAFNOSUPPORT;
>   }
>  
> 

Diff looks sensible. OK claudio@

-- 
:wq Claudio



Re: iked choosing the wrong policy?

2021-07-27 Thread Claudio Jeker
On Tue, Jul 27, 2021 at 07:32:09AM -, Stuart Henderson wrote:
> On 2021-07-27, Vladimir Nikishkin  wrote:
> > Hello, everyone.
> >
> > This is my iked.conf:
> >
> > ```
> > ikev2 "for-phone" passive esp \
> > from any to 10.0.3.2/32 \
> > local egress peer any \
> ...
> > dstid phone.mine \
> 
> > ikev2 "for-laptop" passive esp \
> > from any to 10.0.3.3/32 \
> > local egress peer any \
> ...
> > dstid laptop.mine \
> 
> Two policies with "peer any" doesn't work.
> 
> > How to correct the setup?
> 
> Maybe it's possible by modifying the code, I'm not sure if the
> id is sent early enough though so it might not be possible.

This is one of the biggest annoyances of iked. It does not even help to
use different IPs and 'local' to split up the rules. Would love if someone
would fix this.

-- 
:wq Claudio



Re: DHCP non-issues

2021-07-20 Thread Claudio Jeker
On Tue, Jul 20, 2021 at 08:53:03AM -, Stuart Henderson wrote:
> On 2021-07-19, jungle Boogie  wrote:
> > On Mon, 19 Jul 2021 at 04:48, Christian Weisgerber  
> > wrote:
> >>
> >> Look guys, it's simple.
> >>
> >> If you want IPv6 (SLAAC) autoconfiguration, you set "inet6 autoconf"
> >> for that interface.  slaacd(8) will then automatically handle things.
> >>
> >> If you want IPv4 (DHCP) autoconfiguration, you set "inet autoconf"
> >> for that interface.  dhcpleased(8) will then automatically handle
> >> things.  If you require special DHCP options that dhcpleased(8)
> >> doesn't include, then you don't enable autoconfigurarion and run
> >> dhclient(8) instead, which can be extensively configured.
> >>
> >> Both slaacd(8) and dhcpleased(8) pass nameserver information to
> >> resolvd(8), which adds those nameservers to /etc/resolv.conf unless
> >> unwind(8) is running.  If you don't want that to happen for some
> >> other reason, you turn off resolvd(8).
> >>
> >
> > Sounds like great information to put in current.html:
> > https://www.openbsd.org/faq/current.html
> > I think folks are surprised by the change and want to know how to
> > handle the new daemons in certain situations.
> > Your explanation above is very helpful and probably could be used in
> > current.html
> > I imagine the 7.0 "what's new" section will contain something similar.
> >
> >
> > What do I need to do to have WireGuard start at boot when I want to
> > use a hostname in my hostname.wg0 interface file?
> >
> > Currently, the interface doesn't come up as expected:
> > ifconfig: no address associated with name
> >
> > Are these my options?
> > a. use dhclient
> > b. make a script to start the interface later
> > c. use ip address
> 
> or d. add an entry to /etc/hosts
> 
> Some people are also running into problems with hostnames in pf.conf;
> a c and d apply in that case too.
> 
> Some of this could be fixed by having a way to ask dhcpleased to wait
> (with timeout) for an address during boot. For your example with wg,
> as well as that, netstart would need to be split i.e. start standard
> interfaces, then dhcpleased/unwind/resolvd, then tunnel interfaces.
> 
> I was going to say the same would apply for hostnames used in fstab
> if /usr and /var are NFS-mounted; but actually /usr and /var can't
> be NFS-mounted if you rely on addresses from dhcpleased to reach the
> NFS server anyway (these daemons need access to /var so they need
> to be started after /usr and /var are mounted).
> 

Actually this needs to be fixed in /etc/netstart, dhcpleasd / slaacd. Until
now systems with dynamic ips had the 10sec wait of dhclient to make sure
the interfaces are up and configured. This no longer and because of this
stuff breaks left and right.

Up until now the system relied on the fact that after /etc/netstart ran
the interfaces where up and configured (static or dynamic) and all
following services relied on this fact. Honestly adding host entires is
not a solution because it will not work in all cases. e.g. pf rules using
interface names as addresses will not work correctly.

There must be a way to wait at the end of netstart to ensure that network
configuration settled or timed out. IIRC dlg@ hat a diff that allowed
something along these lines.

We already hit this issue with slaacd on IPv6-only setups and ignored it.
Now it affects everyone, lets not ignore it again.
-- 
:wq Claudio



Re: VLANs isolation

2021-07-13 Thread Claudio Jeker
On Tue, Jul 13, 2021 at 11:34:28AM +0200, Radek wrote:
> Hello,
> I'm going to build a router with +40 vlans.
> I need to block access from every vlan to each other (and then enable traffic 
> between certain vlans as needed).
> 
> How can I do this? Is there any one liner pf block rule to do this?  

Not really but you can try:

block out on vlan received-on vlan

It really matters in how you want to build your filters (outbound or
inbound filtering). Maybe it is better to just start with a block all rule
and slowly allow traffic back. You can use interface groups and pf tags to
help with rule writing.

-- 
:wq Claudio



Re: rpki-client and BLACKHOLE routes

2021-06-23 Thread Claudio Jeker
On Wed, Jun 23, 2021 at 11:40:25AM +0200, Hrvoje Popovski wrote:
> Hi all,
> 
> fist of all, thank you for rpki-client, it's so easy to use it and to
> get the job done.
> I'm playing with rpki-client and denying ovs invalid statement and I've
> seen that with default ovs config statement (deny from ebgp ovs invalid)
> BLACKHOLE routes are blocked/invalid.
> 
> What is the right way to allow BLACKHOLE routes through rpki ? Or if
> someone can give me a hint on what to do.
> 

BLACKHOLE routes normally have a more specific check so you can re-allow
them back after the ovs invalid check (for that you need to take away the
quick from the default ruleset or actually allow quick the blackholes
before).

I guess you can use something along the lines of:
allow quick from group clients inet prefixlen 32 community $BLACKHOLE set 
nexthop blackhole
allow quick from group clients inet6 prefixlen 128 community $BLACKHOLE set 
nexthop blackhole

I guess you also have some client prefix-sets that should be added to the
filter rule so that one client can not blackhole for another.

BLACKHOLE routes are done in many ways and I'm not sure if there is
consensus who is allowed to announce what. Also if there are multiple
paths to the destination should the blackhole only be active if the
covering route is from the same peer?

-- 
:wq Claudio



Re: EACCES of UDP packet

2021-06-22 Thread Claudio Jeker
On Tue, Jun 22, 2021 at 04:48:26PM +0800, Siegfried Levin wrote:
> > Why have you chosen to hide information that may be useful in debugging 
> > your problem?
> 
> I’m truly sorry for the inconvenience but I do have some concerns of security 
> and privacy. I confirm it is not a broadcast address because it is the public 
> IP of the server and this issue has a probability of 1% to happen. The 
> address cannot just be a broadcast address at 1% of the time while not at the 
> rest of 99%. I also double checked it by SSHing to the address I copied from 
> the kdump, if it makes sense.
> 
> > So, since the manpage mentions blocking pf, I suggest the hypothesis "it 
> > returns EACCES because pf is blocking your packets".  I can think of 
> > several ways to test that; what testing have you performed to confirm or 
> > rule out that possibility?  "doas pfctl -d; run test; doas pfctl -e”?
> 
> This issue is really hard to reproduce because the application works at most 
> of the time, but I think you are right. I’ll be watching the pf log in next 
> weeks.
> 

Also check the various counters of netstat -s and especially pfctl -si (or
systat pf). In the pfctl output especially check memory, congestion or
state errors.

-- 
:wq Claudio



Re: Prometheus on OpenBSD - does it work?

2021-06-15 Thread Claudio Jeker
On Tue, Jun 15, 2021 at 04:24:08PM +0200, Julien Pivotto wrote:
> Hello,
> 
> I am a Prometheus maintainer and we have received a bug regarding
> Prometheus - prometheus would no longer work on OpenBSD since we
> introduced MMAP:
> 
> https://github.com/prometheus/prometheus/issues/8877
> https://github.com/prometheus/prometheus/issues/8799
> 
> I would like to know if the facts here are accurate and, on the
> opposite, if there are happy openbsd users of Prometheus 2.19+.
> 
> I see that Prometheus 2.24 is packaged upstream, so I guess there are
> users. Can you please interact with us so we can better understand the
> situation at play.
> 

Unlike other OS OpenBSD does not automatically sync between mmap-ed memory
of a file with any write() to the same file (OpenBSD has no unified
cache). It requries use of msync(2) to make sure that mappings are
properly updated.

While prometheus works, it also does not. I looked into the code of TSDB
and came to the conclusion that many operations (especially compaction)
fail because TSDB writes to file handels but uses mmaps of the same memory
at the same time.

I fixed one case (which is the one mentioned in the issues index/index.go
but then more errors show up when running tsdb go test. Including a SEGV
in db_test.go

I played a bit more with this and skipping the bad test in db_test.go it
seems to mostly pass but errors out at the end:

level=error msg="WAL corruption detected; truncating" err="unexpected
CRC32 checksum 7c1a52ff, want 1020304"
file=/tmp/test_corrupted095078964/01 pos=44
PASS
goleak: Errors on successful test run: found unexpected goroutines:
[Goroutine 17761 in state chan send, with
github.com/prometheus/prometheus/tsdb.(*SegmentWAL).cut.func1 on top of
the stack:
goroutine 17761 [chan send]:
github.com/prometheus/prometheus/tsdb.(*SegmentWAL).cut.func1(0xc001262fd0,
0xc0eff0)
/usr/ports/pobj/prometheus-2.27.1/go/src/all/tsdb/wal.go:571 +0x72
created by github.com/prometheus/prometheus/tsdb.(*SegmentWAL).cut
/usr/ports/pobj/prometheus-2.27.1/go/src/all/tsdb/wal.go:570 +0x7a

 Goroutine 18135 in state chan send, with
github.com/prometheus/prometheus/tsdb.(*SegmentWAL).cut.func1 on top of
the stack:
goroutine 18135 [chan send]:
github.com/prometheus/prometheus/tsdb.(*SegmentWAL).cut.func1(0xc99290,
0xc000be24b0)
/usr/ports/pobj/prometheus-2.27.1/go/src/all/tsdb/wal.go:571 +0x72
created by github.com/prometheus/prometheus/tsdb.(*SegmentWAL).cut
/usr/ports/pobj/prometheus-2.27.1/go/src/all/tsdb/wal.go:570 +0x7a
]
exit status 1
FAILgithub.com/prometheus/prometheus/tsdb   83.561s

The TSDB code is very hard to follow and debug. There is mmaps all over
the place and it is unclear which files are written too and which are not.
Also the MmapFile struct are not stored in some other structs and so it is
not that simple to call msync.
-- 
:wq Claudio

$OpenBSD$

Add msync to sync mmap buffers

diff --git tsdb/fileutil/mmap.go tsdb/fileutil/mmap.go
index 4dbca4f97..516991c60 100644
--- tsdb/fileutil/mmap.go
+++ tsdb/fileutil/mmap.go
@@ -71,3 +71,7 @@ func (f *MmapFile) File() *os.File {
 func (f *MmapFile) Bytes() []byte {
return f.b
 }
+
+func (f *MmapFile) Sync() error {
+   return sync(f.b)
+}
diff --git tsdb/fileutil/mmap_unix.go tsdb/fileutil/mmap_unix.go
index 043f4d408..c21829989 100644
--- tsdb/fileutil/mmap_unix.go
+++ tsdb/fileutil/mmap_unix.go
@@ -28,3 +28,7 @@ func mmap(f *os.File, length int) ([]byte, error) {
 func munmap(b []byte) (err error) {
return unix.Munmap(b)
 }
+
+func sync(b []byte) error {
+   return unix.Msync(b, unix.MS_ASYNC)
+}
diff --git tsdb/fileutil/mmap_windows.go tsdb/fileutil/mmap_windows.go
index b94226412..c54b6b125 100644
--- tsdb/fileutil/mmap_windows.go
+++ tsdb/fileutil/mmap_windows.go
@@ -44,3 +44,7 @@ func munmap(b []byte) error {
}
return nil
 }
+
+func sync(b []byte) error {
+   return nil
+}
diff --git tsdb/index/index.go tsdb/index/index.go
index a6ade9455..723f2bc73 100644
--- tsdb/index/index.go
+++ tsdb/index/index.go
@@ -552,6 +552,7 @@ func (w *Writer) finishSymbols() error {
if err := w.writeAt(w.buf1.Get(), hashPos); err != nil {
return err
}
+   w.symbolFile.Sync()
 
// Load in the symbol table efficiently for the rest of the index 
writing.
w.symbols, err = NewSymbols(realByteSlice(w.symbolFile.Bytes()), 
FormatV2, int(w.toc.Symbols))



Re: Howto measure pps at forwarding plane

2021-06-10 Thread Claudio Jeker
On Thu, Jun 10, 2021 at 09:23:03AM -, Stuart Henderson wrote:
> On 2021-06-10, Valdrin MUJA  wrote:
> > Hello,
> >
> > I'm trying to figure out how much packets are being forwarded on my OpenBSD 
> > firewall.
> > Here a small script i wrote.
> >
> >
> > #!/bin/sh
> >
> >
> > VAL1=`netstat -s | grep 'packets forwarded' | head -1 | awk -F ' ' '{print 
> > $1}'`
> >
> > sleep 1
> >
> > VAL2=`netstat -s | grep 'packets forwarded' | head -1 | awk -F ' ' '{print 
> > $1}'`
> >
> >
> > echo "$(($VAL2-$VAL1))"
> >
> >
> > But i can not be sure if i am doing the right thing?
> > Can anyone check it please.
> > Thanks.
> >
> 
> If you are only interested in IPv4 then yes that'll do it.
> This would save some cpu cycles though:
> 
> VAL1=`netstat -s | awk '/packets forwarded/ { print $1; exit }'`
> 

And use netstat -spip which limits the number of sysctls made in netstat.

-- 
:wq Claudio



Re: openbgpd "depend on"

2021-06-09 Thread Claudio Jeker
On Wed, Jun 09, 2021 at 09:57:32AM +0200, open...@kene.nu wrote:
> Hello,
> 
> Just a question and maybe a suggestion. I am implementing a few DCs that
> use vxlan symmetric routing and hence, layer2 redundancy protocols like
> CARP (and VRRP/HSRP) do not work as intended due to evpn layer2 being the
> technology of choice to announce ARP entries.
> 
> This led me to try out the "depend on carp" functionality that is available
> on openbgpd. It does what I want, partially. It would be much more usable
> if you cold define what this functionality does in case of a CARP backup
> state. Currently it puts the bgp neighbor into Idle state. However, it
> would be better if one could define that it should as-path prepend and/or
> add a metric (MED) instead. This way, carp failovers would not rely on the
> tedious and relatively time consuming process of setting up a BGP session
> and announcing prefixes before it can truly be carp master.
> 
> WDYT?

The 'depend on' feature was added to use a CARP cluster as a BGP border
router (e.g. at an IXP that only gives one IP/port). In that case the
backup carp interface is not able to open a TCP session. The backup carp
interface is not reachable and the session would conflict with the master
session.

What you would like is to add depend on on announcements (network
10.0.0.0/24 depend on carp0) or probably as a filter (match to group
uplinks depend on carp set med 100). At least this is how I understand
your request.

-- 
:wq Claudio



Re: pf, relayd, TCP keep alive and NAT, oh my!

2021-06-01 Thread Claudio Jeker
On Tue, Jun 01, 2021 at 10:25:38AM +1000, Cameron Simpson wrote:
> Can I enforce or implement TCP keep alives on a TCP stream via my 
> firewall?
> 
> Background:
> 
> I've got a client with an OpenBSD firewall and a Telstra NBN modem as 
> their modem.
> 
> Their IMAP server is upstream in the cloud (Unbuntu, courier imap). I 
> have this odd problem which I am beginning to suspect is the NBN modem 
> getting bored and dropping its NAT entries. Let me explain...
> 
> At the firewall end I see about 30 ESTABLISHED connections to the IMAP 
> server. At the IMAP server I see over 500, which is about where the IMAP 
> service stops accepting new connections, leading to errors from the 
> client mail readers.
> 
> My current theory is that the IMAP client connections issue the IMAP 
> IDLE command and go passive, waiting for email notifications from the 
> server.  So we have an idle TCP connection across the firewall and 
> across the NBN modem (which NATs).
> 
> My conjecture is that at some point the modem discards idle connection 
> states. (This could just as well happen at any other intermediate 
> stateful router too.) After that event, the client end does something 
> which tries to use the connection, gets an RST from the modem, clean 
> tidyup happens on the client and in the firewall.
> 
> At the server end, none of this is seen and the imapd just sits around 
> idle, never releasing the connection and never stopping the matching 
> daemon process. This gradually rises to hit the server's configured 
> connection limit and it stops accepting new things.
> 
> If I had TCP keep alive turned on, both ends might tidy themselves up.  
> I can't enable that on the clients (various mail readers) or, 
> apparently, on the server configuration. I can't do it in PF because PF 
> just copies packets. I can't seem to do it in relayd either, though that 
> seems the obvious way to intercept the connection for this purpose.
> 
> Any suggestions?

Make sure you use 'block return' at least for the imap connections. This
way when the state is dropped the firewall will issue a RST packet to the
server which will close the connection.

On OpenBSD there is the 'net.inet.tcp.always_keepalive' sysctl to enable
keepalive by default. So that is something you can enable on the IMAP
server to force keep-alive on there. Other systems have similar knobs.

-- 
:wq Claudio



Re: openrsync manpage error

2021-05-17 Thread Claudio Jeker
On Fri, May 14, 2021 at 12:31:32PM +0300, Irshad Sulaiman wrote:
> Hi
>  Originally I was trying sync usb drive with openbsd box I was getting 
> same error 
> 
> Below is eg: I have two files bar and baz in home dir and dest is destination 
> directory 
> While trying to sync I get error 
> And if I try ‘rsync’ as command I get error not found 
> Iam in 6.9 release with syspatch updated 
> 
> 
> irshad:/home/irshad/test# ls
> bar  baz  dest
> irshad:/home/irshad/test# openrsync -t  bar baz dest/
> openrsync: error: unexpected end of file
> irshad:/home/irshad/test# openrsync -t  bar baz root@192.168.1.1:bar
> root@192.168.1.1's password:
> ash: rsync: not found
> openrsync: error: unexpected end of file
> irshad:/home/irshad/test# rsync
> ksh: rsync: not found
> irshad:/home/irshad/test# uname -a
> OpenBSD openbsd.local 6.9 GENERIC.MP#473 amd64
> irshad:/home/irshad/test#
> 

Yes, this is behaviour is expected right now. Since we install openrsync
as openrsync but the --rsync-path defaults to rsync (as it should).
This is normally not an issue since the remote server most probably has
rsync installed. I also have rsync installed on most of my systems so I
did not notice this.

Right now people should use the rsync package since the openrsync is not
enough compatible to work well in all scenarios.


> > On 14-May-2021, at 12:02 PM, Claudio Jeker  wrote:
> > 
> > On Fri, May 14, 2021 at 12:44:45AM +0300, Irshad Sulaiman wrote:
> >> Hi 
> >> 
> >> I have modified error in openrsync(1) manpage in Example section isn’t
> >> that ‘openrsync -t'  instead of 'rsync -t ‘
> >> And without --rsync-path= it gives an following error 'openrsync: error:
> >> unexpected end of file’
> > 
> > I did try all three examples and they do work for me without adding
> > --rsync-path=. On which command did you get the unexpected result.
> > Can you share the exact way to reproduce this issue?
> > 
> >> Apologize if Iam wrong 
> >> 
> >> Thanks 
> >> Irshad 
> >> 
> >> 
> >> 
> >> Index: rsync.1
> >> ===
> >> RCS file: /cvs/src/usr.bin/rsync/rsync.1,v
> >> retrieving revision 1.24
> >> diff -u -p -r1.24 rsync.1
> >> --- rsync.131 Mar 2021 20:36:05 -  1.24
> >> +++ rsync.113 May 2021 21:25:57 -
> >> @@ -234,7 +234,7 @@ with the local
> >> and
> >> .Pa ../src/baz :
> >> .Pp
> >> -.Dl % rsync -t ../src/bar ../src/baz host:dest
> >> +.Dl % openrsync -t --rsync-path=openrsync  ../src/bar ../src/baz host:dest
> >> .Pp
> >> To update the out-of-date local files
> >> .Pa bar
> >> @@ -245,7 +245,7 @@ with the remote files
> >> and
> >> .Pa host:src/baz :
> >> .Pp
> >> -.Dl % rsync -t host:src/bar :src/baz \&.
> >> +.Dl % openrsync -t --rsync-path=openrsync  host:src/bar :src/baz \&.
> >> .Pp
> >> To update the out-of-date local files
> >> .Pa ../dest/bar
> >> @@ -256,7 +256,7 @@ with
> >> and
> >> .Pa baz :
> >> .Pp
> >> -.Dl % rsync -t bar baz ../dest
> >> +.Dl % openrsync -t --rsync-path=openrsync  bar baz ../dest
> >> .\" .Sh DIAGNOSTICS
> >> .Sh SEE ALSO
> >> .Xr ssh 1
> >> 
> > 
> > -- 
> > :wq Claudio
> 

-- 
:wq Claudio



Re: openrsync manpage error

2021-05-14 Thread Claudio Jeker
On Fri, May 14, 2021 at 12:44:45AM +0300, Irshad Sulaiman wrote:
> Hi 
> 
> I have modified error in openrsync(1) manpage in Example section isn’t
> that ‘openrsync -t'  instead of 'rsync -t ‘
> And without --rsync-path= it gives an following error 'openrsync: error:
> unexpected end of file’

I did try all three examples and they do work for me without adding
--rsync-path=. On which command did you get the unexpected result.
Can you share the exact way to reproduce this issue?

> Apologize if Iam wrong 
> 
> Thanks 
> Irshad 
> 
> 
> 
> Index: rsync.1
> ===
> RCS file: /cvs/src/usr.bin/rsync/rsync.1,v
> retrieving revision 1.24
> diff -u -p -r1.24 rsync.1
> --- rsync.1   31 Mar 2021 20:36:05 -  1.24
> +++ rsync.1   13 May 2021 21:25:57 -
> @@ -234,7 +234,7 @@ with the local
>  and
>  .Pa ../src/baz :
>  .Pp
> -.Dl % rsync -t ../src/bar ../src/baz host:dest
> +.Dl % openrsync -t --rsync-path=openrsync  ../src/bar ../src/baz host:dest
>  .Pp
>  To update the out-of-date local files
>  .Pa bar
> @@ -245,7 +245,7 @@ with the remote files
>  and
>  .Pa host:src/baz :
>  .Pp
> -.Dl % rsync -t host:src/bar :src/baz \&.
> +.Dl % openrsync -t --rsync-path=openrsync  host:src/bar :src/baz \&.
>  .Pp
>  To update the out-of-date local files
>  .Pa ../dest/bar
> @@ -256,7 +256,7 @@ with
>  and
>  .Pa baz :
>  .Pp
> -.Dl % rsync -t bar baz ../dest
> +.Dl % openrsync -t --rsync-path=openrsync  bar baz ../dest
>  .\" .Sh DIAGNOSTICS
>  .Sh SEE ALSO
>  .Xr ssh 1
> 

-- 
:wq Claudio



Re: pf firewall bridge0 vether0 blocks DHCP for bridge interfaces connected to Windows

2021-03-10 Thread Claudio Jeker
On Wed, Mar 10, 2021 at 08:40:55PM +0100, da...@hajes.org wrote:
> Hi,
> 
> I did set up OpenBSD router/firewall on PC Engines APU4d4 box.
> 
> First interface is WAN that connects to Internet.
> 
> Remaining three interfaces are bridged with bridge0 via vether0.
> 
> firewall doesn't block LAN/bridge traffic on vether0.
> 
> DHCPD runs on bridge.
> 
> Two Linux hosts (connected to em2 and em3) connect without problem but
> Windows host DHCP requests are blocked on em1.
> 
> I didn't find any info regarding pf and bridging.

Please check bridge(4) manpage, especially the NOTES section.
 
> set skip on lo0
> set skip on bridge0

This line is useless. Packets never show up on bridge0. You need to add
the physical interfaces and vether0 to your ruleset.
 
> So far I have found a kludge for Windows "set skip on em1"
> 
> Once, above by line is present in pf.conf, Win 10 host is allowed to acquire
> IP address. Interesting is that Linux has no issues to acquire IP addresses
> via DHCP.
> 
> Any suggestions, please?
 
You need to fix your pf.conf.

> Is it something screwed up in Windows such as short 3-way-handshake?

I doubt it. Your ruleset is most probably not allowing packets to pass
properly over the bridge. Since you did not share your pf.conf file it is
impossible to give you a better answer. 

-- 
:wq Claudio



Re: iSCSI LUN mount on boot

2021-02-20 Thread Claudio Jeker
On Fri, Feb 19, 2021 at 07:48:25PM -0500, Ashton Fagg wrote:
> I'm curious as to what other folks are doing for mounting iSCSI volumes
> at boot time. I've successfully configured iscsid, and mounting the
> volume manually works as expected.
> 
> I found this article [1] which suggests that hotplugd should be used.
> 
> I also found this old presentation [2] which suggests it should "just
> work" with an entry in /etc/fstab. Maybe I did not get this correct, as:
> 
> .a /mnt/test ffs rw,noatime,nodev,nosuid,nofail 1 2
> 
> causes the machine to go into single-user mode on boot (presumably
> because the iSCSI daemon hasn't yet started).
> 
> Am I missing something here? Is hotplugd the preferred way to accomplish this?

Yeah, the documentation is not great.

You need to add 'net' to the mount options in /etc/fstab for iscsi drives.
Then our rc script will do the right thing and mount these drives late
(after iscsid started).

.a /mnt/test ffs rw,noatime,nodev,nosuid,net 1 2

With that it should work. You can not use iscsi for /, /usr, /var but it
works for /home or /var/www.

-- 
:wq Claudio



Re: Unknown process modifying routing table

2021-02-06 Thread Claudio Jeker
On Sat, Feb 06, 2021 at 02:16:20PM +0100, Otto Moerbeek wrote:
> On Sat, Feb 06, 2021 at 12:18:40PM +, James wrote:
> 
> > I've disabled my VPN on the machine as well as dhclient, connecting via a
> > fixed static IP address and DNS servers. My routing table is still being
> > modifed by PID 0 (which I assume to be the kernel) every 30 minutes or so.
> > Ntpd is also disabled.
> > 
> > I have also caught my machine communicating to one the of the IPs via TCP
> > and have a pcap dump from wireshark. No actual data was sent other than a
> > TCP timestamp.
> > 
> > > If your default route is a VPN,
> > > please show how you establish the VPN to be your default route.
> > > 
> > The default route is established mannually in a script that is run after the
> > VPN starts. Essentially it does the following:
> > 
> >     route add $VPN_HOST $DEFAULT_GW
> > 
> >     route change default $VPN_HOST
> > 
> > 
> > I do not belive the VPN to be the cause of this problem.
> > 
> > 
> > Any tips on debugging the kernel to track the cause of these route changes
> > would be greatly appreciated.
> > 
> > 
> > Thanks,
> > 
> 
> The kernel uses the routing table to store things like PMTU discovery
> data and ARP entries,
> 

Also showing the route -n monitor output will help to identify what is
going on.

-- 
:wq Claudio



Re: Ask ospfd

2021-02-01 Thread Claudio Jeker
On Tue, Feb 02, 2021 at 12:06:37PM +0700, Adiwangsa Kusumah wrote:
> Dear All,
> 
> I have topology as below:
> 
> UP1 UP2
> \ /
>   \  /
>   OBSD6.6
> /\
>   /\
> OSPF1OSPF2
> 
> 
> I use openbgpd to upstream and  openospfd to internal
> I want my openbsd send 0.0.0.0/0 to my ospf (single area)
> 
> At my bgpd.conf  I add
> network 0.0.0.0/0
> 
> Ay my ospfd I tri to add
> redistribute default
> and/or
> redistribute 0.0.0.0/0
> 
> when i check my ospf, there is no 0.0.0.0 send to my internal network
> 
> ospfctl sh database self-originated
> 
> Link ID Adv Router  Age  Seq#   Checksum
> 10. xxx.xxx.248  103.xxx.xxx.11   1225 0x8048 0x2471
> 10. xxx.xxx.252  103. xxx.xxx.11   1225 0x804a 0xf797
> 103. xxx.xxx.72103. xxx.xxx.11   1225 0x8048 0xe1c4
> 103. xxx.xxx.60   103. xxx.xxx.11   1225 0x804a 0x858d
> 103. xxx.xxx.12   103. xxx.xxx.11   1225 0x804a 0x3b05
> 
> Is that any additional configuration at my bgpd.conf or my ospfd.conf?
> Your advice will be appreciated.
> 

ospfd(4) redistribute requires that the corresponding route is present in
the routing table (route -n get default). This is not the case for
bgpd(8). So make sure that you have a default route in kernel routing table.

-- 
:wq Claudio



  1   2   3   4   5   6   7   8   9   10   >