from:"David Gwynne"

Re: sec interface and rdomain

2023-12-25 Thread David Gwynne

which bit doesnt work? the "tunneldomain" command or actual packets moving?

sec transport is provided entirely by the ipsec stack, ie, you configure the 
ipsec SAs associated with the interface to operate in a specific rdomain, sec 
doesn't support configuration that with tunneldomain.

if you tcpdump on the enc and sec interfaces, do you see the packets you're 
expecting?

dlg

> On 24 Dec 2023, at 19:21, Holger Glaess  wrote:
> 
> hi
> 
> i try to use the new sec0 if in this manner.
> 
> ---
> 
> cat /etc/hostname.sec0
> 
> rdomain 10
> 
> inet 172.16.0.1 255.255.255.252 172.16.0.2
> 
> tunneldomain 0
> 
> up
> 
> ---
> 
> 
> but it will not work .
> 
> can sec work with rdomain ?
> 
> 
> happy christmas to all.
> 
> 
> Holger
> 
> 
>

Re: Realtek 8723BE unsupported

2023-12-03 Thread David Gwynne

On Sun, Dec 03, 2023 at 06:02:03PM +0100, Jan Stary wrote:
> (please keep replies on the list)
> 
> On Dec 03 12:08:08, kolip...@exoticsilicon.com wrote:
> > On Sun, Dec 03, 2023 at 02:35:11PM +0100, Jan Stary wrote:
> > > This is current/amd64 on a HP 260 G2 mini PC (dmesg below).
> > > Everything works, except the wifi seems to be unsupported:
> > > 
> > > "Realtek 8723BE" rev 0x00 at pci2 dev 0 function 0 not configured
> > 
> > What does pcidump -v show?
> 
> First of all, pcidump -v (but not pcidump) fucks up re(4):
> 
> rgephy0 detached
> re0 detached
> re0 at pci1 dev 0 function 0 "Realtek 8168" rev 0x10: RTL8168GU/8111GU 
> (0x5080), msi, address 7c:d3:0a:21:eb:f5
> rgephy0 at re0 phy 7: RTL8251 PHY, rev. 0
> re0: cannot create re-stats kstat
> rgephy0 detached
> re0 detached
> re0 at pci1 dev 0 function 0 "Realtek 8168" rev 0x10: RTL8168GU/8111GU 
> (0x5080), msi, address 7c:d3:0a:21:eb:f5
> rgephy0 at re0 phy 7: RTL8251 PHY, rev. 0
> re0: cannot create re-stats kstat
> 
> Is anyone seeing that, i.e. devices detaching
> when they are being probed by pcidump?
> 
> After doing the pcidump -v localy and rebooting to upload, I get this.
> Note that the Realtek 8168 entry seems mangled (related to the above?).

pcidump causing a device to detach is a problem, but the kstat bit is a
separate problem too.

the diff below consolidates the detach code in re(4) and adds the code
to tear the kstat down when the device goes away.

Index: ic/re.c
===
RCS file: /cvs/src/sys/dev/ic/re.c,v
retrieving revision 1.216
diff -u -p -r1.216 re.c
--- ic/re.c 10 Nov 2023 15:51:20 -  1.216
+++ ic/re.c 4 Dec 2023 01:03:30 -
@@ -199,6 +199,7 @@ int re_wol(struct ifnet*, int);
 #endif
 #if NKSTAT > 0
 void   re_kstat_attach(struct rl_softc *);
+void   re_kstat_detach(struct rl_softc *);
 #endif
 
 void   in_delayed_cksum(struct mbuf *);
@@ -1128,6 +1129,27 @@ fail_0:
return (1);
 }
 
+void
+re_detach(struct rl_softc *sc)
+{
+   struct ifnet*ifp = >sc_arpcom.ac_if;
+
+#if NKSTAT > 0
+   re_kstat_detach(sc);
+#endif
+
+   /* Remove timeout handler */
+   timeout_del(>timer_handle);
+
+   /* Detach PHY */
+   if (LIST_FIRST(>sc_mii.mii_phys) != NULL)
+   mii_detach(>sc_mii, MII_PHY_ANY, MII_OFFSET_ANY);
+
+   /* Delete media stuff */
+   ifmedia_delete_instance(>sc_mii.mii_media, IFM_INST_ANY);
+   ether_ifdetach(ifp);
+   if_detach(ifp);
+}
 
 int
 re_newbuf(struct rl_softc *sc)
@@ -2608,6 +2630,27 @@ freedma:
 destroy:
bus_dmamap_destroy(sc->sc_dmat, re_ks_sc->re_ks_sc_map);
 free:
+   free(re_ks_sc, M_DEVBUF, sizeof(*re_ks_sc));
+}
+
+void
+re_kstat_detach(struct rl_softc *sc)
+{
+   struct kstat *ks = sc->rl_kstat;
+   struct re_kstat_softc *re_ks_sc;
+
+   if (ks == NULL)
+   return;
+
+   kstat_remove(ks);
+   re_ks_sc = ks->ks_ptr;
+   kstat_destroy(ks);
+
+   bus_dmamap_unload(sc->sc_dmat, re_ks_sc->re_ks_sc_map);
+   bus_dmamem_unmap(sc->sc_dmat,
+   (caddr_t)re_ks_sc->re_ks_sc_stats, sizeof(struct re_stats));
+   bus_dmamem_free(sc->sc_dmat, _ks_sc->re_ks_sc_seg, 1);
+   bus_dmamap_destroy(sc->sc_dmat, re_ks_sc->re_ks_sc_map);
free(re_ks_sc, M_DEVBUF, sizeof(*re_ks_sc));
 }
 #endif /* NKSTAT > 0 */
Index: ic/revar.h
===
RCS file: /cvs/src/sys/dev/ic/revar.h,v
retrieving revision 1.7
diff -u -p -r1.7 revar.h
--- ic/revar.h  27 Jul 2010 20:53:39 -  1.7
+++ ic/revar.h  4 Dec 2023 01:03:30 -
@@ -18,6 +18,7 @@
 
 intre_intr(void *);
 intre_attach(struct rl_softc *, const char *);
+void   re_detach(struct rl_softc *);
 void   re_reset(struct rl_softc *);
 intre_init(struct ifnet *);
 void   re_stop(struct ifnet *);
Index: pci/if_re_pci.c
===
RCS file: /cvs/src/sys/dev/pci/if_re_pci.c,v
retrieving revision 1.56
diff -u -p -r1.56 if_re_pci.c
--- pci/if_re_pci.c 11 Mar 2022 18:00:48 -  1.56
+++ pci/if_re_pci.c 4 Dec 2023 01:03:30 -
@@ -223,19 +223,8 @@ re_pci_detach(struct device *self, int f
 {
struct re_pci_softc *psc = (struct re_pci_softc *)self;
struct rl_softc *sc = >sc_rl;
-   struct ifnet*ifp = >sc_arpcom.ac_if;
 
-   /* Remove timeout handler */
-   timeout_del(>timer_handle);
-
-   /* Detach PHY */
-   if (LIST_FIRST(>sc_mii.mii_phys) != NULL)
-   mii_detach(>sc_mii, MII_PHY_ANY, MII_OFFSET_ANY);
-
-   /* Delete media stuff */
-   ifmedia_delete_instance(>sc_mii.mii_media, IFM_INST_ANY);
-   ether_ifdetach(ifp);
-   if_detach(ifp);
+   re_detach(sc);
 
/* Disable interrupts */
if (sc->sc_ih != NULL)
Index: cardbus/if_re_cardbus.c
===
RCS file:

Re: Bridging em and vlan

2023-10-05 Thread David Gwynne




> On 6 Oct 2023, at 01:50, David Higgs  wrote:
> 
> Logically, I wanted three hosts in the same broadcast domain (ISP CPE, IoT 
> device, OpenBSD router), so tpmr(4) didn't seem appropriate - was I missing 
> something?

No, you were right to reach for veb in your setup.

Re: Bridging em and vlan

2023-10-05 Thread David Gwynne

> On 5 Oct 2023, at 11:17, David Higgs  wrote:
> 
> On Tue, Oct 3, 2023 at 10:10 AM David Higgs  wrote:
> 
>> On Mon, Oct 2, 2023 at 9:26 AM David Higgs  wrote:
>> 
>>> On Sun, Oct 1, 2023 at 9:13 AM Zé Loff  wrote:
>>> 
 On Sat, Sep 30, 2023 at 11:39:36AM -0400, David Higgs wrote:
> All of my devices until now have been behind my OpenBSD NAT router,
 but I
> recently acquired a Internet of Trash device that I would like to be
> accessible to the internet (yes, I know).
> 
> My home configuration uses a Unifi AP to translate my various SSIDs
 into
> VLANs which plug into one of my APU em(4) ports.  The IoT thing
 already has
> its own dedicated SSID/VLAN, but doesn't enjoy living behind my NAT.

 Define "doesn't enjoy".  It absolutely requires a public IP?  It needs
 some ports to be forwarded?  Has some sort of network connection
 detection that fails because some ports are blocked for outgoing
 traffic?

>>> 
>>> I'm still trying to determine ground truth with manufacturer support.
>>> Port forwarding doesn't seem sufficient.  The device can reach out just
>>> fine but is not remotely controllable as advertised.
>>> 
 Is there a way for me to bridge just one of the vlan(4) logical
 interfaces
> with my other em(4) uplink, so that my IoT item can speak DHCP directly
> with my internet provider?

>>> 
 Can this be done with veb/vport or bridge, or will I need to use
 something
> more exotic to strip the 802.1q tags before they are sent to my ISP?

>>> 
>>> Self-replying here: I don't see many examples of veb(4) use online, but
>>> it seems as if I can add my physical uplink and the IoT VLAN both to a
>>> veb and attach a vport to become my new uplink.  That should be logically
>>> equivalent to putting a three-port switch between my router and my ISP CPE,
>>> with the third port for the IoT device.  Is anyone able to shoot holes in
>>> this or suggest a superior alternative, before I attempt the configuration
>>> later this week?
>>> 
>> 
>> I appreciate the previous replies/cluebats, but my initial attempt was
>> rushed and unsuccessful.
>> 
>> In broad strokes, I created veb0 and added em0, vlan222, and vport0 to
>> it.  Then I tried getting vport0 to speak DHCP with my upstream, but
>> nothing seemed to happen or appear in logs.
>> 
>> I will have to spend more time on this to eliminate the possibility of
>> fat-fingering, remove various confounding variables, and produce a better
>> result/report.
>> 
> 
> For the archives, this worked swimmingly once I paid closer attention to
> what I was doing.  Based on my second attempt, I hadn't put my vport0
> interface up.
> 
> Of course, my ISP isn't handing out more than a single IPv4 address by
> default, so all this has been simply a good learning experience.

For future reference, if you just want to join two ethernet interfaces on an 
openbsd box together you can use tpmr(4). It was almost called xcon(4), short 
for cross-connect.

It's also worth noting the steps taken by the Ethernet stack when it processes 
packets and which drivers can take packets at which point. Let's assume an 
ethernet packet is coming in on a physical interface, em0 in this case.

1: trunk/aggr processing

If em0 is part of trunk/aggr, then those drivers will steal the packet and 
start processing it again as if it was received on the relevant trunk/aggr 
interface.

2. service delimited packet filtering, ie, vlan/svlan handling

If em0 is a parent interface to vlan or svlan interfaces, this is when they get 
taken and processing starts again as if they were received on the virtual 
interfaces.

If no vlan/svlan interface is configured, the packets are marked as now marked 
as "service delimited".

3. bridge processing

This is where bridge/veb/tpmr can take a packet.

4. dropping service delimited packets

This is where vlan/svlan tagged packets are dropped that all the preceding 
aggr/trunk/vlan/svlan/bridge/veb/tpmr drivers declined. The exception is 
packets send to vlan 0, because vlan 0 isn't real and is only used to carry 
priority information on the wire for the native vlan.

This means that you can set up a bridge/veb/tpmr that forwards vlan tagged 
packets, but optionally slice specific vlans off for other processing by 
configuring a vlan interface with em0 as a parent to take those packets away 
first.

5. carp

If the destination address is for a carp interface on em0, it's at this point 
it's taken away.

6. Ethernet procotol handling

This is when the arp/ipv4/ipv6 protocols are checked and the packets are fed 
into the layer 3 stacks.

Re: OpenBSD Wireguard implementation not copying ToS from inner to outer WG header

2023-09-21 Thread David Gwynne

On Mon, Sep 18, 2023 at 12:47:52PM -, Stuart Henderson wrote:
> On 2023-09-17, Andrew Lemin  wrote:
> > I have been testing the Wireguard implementation on OpenBSD and noticed
> > that the ToS field is not being copied from the inner unencrypted header to
> > the outer Wireguard header, resulting in ALL packets going into the same PF
> > Prio / Queue.
> >
> > For example, ACKs (for Wireguard encrypted packets) end up in the first
> > queue (not the priority queue) despite PF rules;
> >
> > queue ext_iface on $extif bandwidth 1000M max 1000M
> >   queue pri on $extif parent ext_iface flows 1000 bandwidth 25M min 5M
> >   queue data on $extif parent ext_iface flows 1000 bandwidth 100M default
> >
> > match on $extif proto tcp set prio (3, 6) set queue (data, pri)
> >
> > All unencrypted SYNs and ACKs etc correctly go into the 'pri' queue, and
> > payload packets go into 'data' queue.
> > However for Wireguard encrypted packets, _all_ packets (including SYNs and
> > ACKs) go into the 'data' queue.
> >
> > I thought maybe you need to force the ToS/prio/queue values, so I also
> > tried sledgehammer approach;
> > match proto tcp flags A/A set tos lowdelay set prio 7 set queue pri
> > match proto tcp flags S/S set tos lowdelay set prio 7 set queue pri
> >
> > But sadly all encrypted SYNs and ACKs etc still only go into the data queue
> > no matter what.
> > This can be confirmed with wireshark that all ToS bits are lost
> >
> > This results in poor Wireguard performance on OpenBSD.
> 
> Here's a naive untested diff that might at least use the prio internally
> in OpenBSD...
> 
> Index: if_wg.c
> ===
> RCS file: /cvs/src/sys/net/if_wg.c,v
> retrieving revision 1.29
> diff -u -p -r1.29 if_wg.c
> --- if_wg.c   3 Aug 2023 09:49:08 -   1.29
> +++ if_wg.c   18 Sep 2023 12:47:02 -
> @@ -1525,6 +1525,8 @@ wg_encap(struct wg_softc *sc, struct mbu
>*/
>   mc->m_pkthdr.ph_flowid = m->m_pkthdr.ph_flowid;
>  
> + mc->m_pkthdr.pf.prio = m->m_pkthdr.pf.prio;
> +
>   res = noise_remote_encrypt(>p_remote, >r_idx, ,
>  data->buf, plaintext_len);
>   nonce = htole64(nonce); /* Wire format is little endian. */
> 
> 

i think this should go in, ok by me.

implementing txprio and rxprio might be useful too, but requires more
plumbing than i have the energy for now.

Re: Netstat output

2023-09-10 Thread David Gwynne

> On 7 Sep 2023, at 08:00, Steven Shockley  wrote:
> 
> When running netstat -I [interface], what do the "fails" and "errs" columns 
> mean?  When my firewall is under network load, the output interface fails and 
> total errs increases.

fails are the sum of qdrops and errs. qdrops are when the network stack drops 
packets getting packets on or off the driver, and errs are problems the driver 
has with packets. netstat -eI foo0 shows the errors on their own, netstat -dI 
foo0 shows the drops on their own.

if it's qdrops then it's a software performance/configuration problem. if it's 
errs then it's something in the driver reporting errors. if the driver provides 
kstats then you might be able to figure out if it's a dodgy cable or something 
like that.

dlg

Re: pf state-table-induced instability

2023-08-31 Thread David Gwynne

On Thu, Aug 31, 2023 at 04:10:06PM +0200, Gabor LENCSE wrote:
> Dear David,
> 
> Thank you very much for all the new information!
> 
> I keep only those parts that I want to react.
> 
> > > It is not a fundamental issue, but it seems to me that during my tests not
> > > only four but five CPU cores were used by IP packet forwarding:
> > the packet processing is done in kernel threads (task queues are built
> > on threads), and those threads could be scheduled on any cpu. the
> > pf purge processing runs in yet another thread.
> > 
> > iirc, the schedule scans down the list of cpus looking for an idle
> > one when it needs to run stuff, except to avoid cpu0 if possible.
> > this is why you see most of the system time on cpus 1 to 5.
> 
> Yes, I can confirm that any time I observed, CPU00 was not used by the
> system tasks.
> 
> However, I remembered that PF was disabled during my stateless tests, so I
> think its purge could not be the one that used CPU05. Now I repeated the
> experiment, first disabling PF as follows:

disabling pf means it doesnt get run for packets in the network stack.
however, the once the state purge processing is started it just keeps
running. if you have zero states, there wont be much to process though.

there will be other things running in the system that could account for
the "extra" cpu utilisation.

> dut# pfctl -d
> pf disabled
> 
> And I can still see FIVE CPU cores used by system tasks:

the network stack runs in these threads. pf is just one part of the
network stack.

> 
> load averages:?? 0.69,?? 0.29,
> 0.13 dut.cntrg
> 14:41:06
> 36 processes: 35 idle, 1 on processor up 0 days 00:03:46
> CPU00 states:?? 0.0% user,?? 0.0% nice,?? 0.0% sys,?? 0.2% spin, 8.1% intr,
> 91.7% idle
> CPU01 states:?? 0.0% user,?? 0.0% nice, 61.1% sys,?? 9.5% spin, 9.5% intr,
> 19.8% idle
> CPU02 states:?? 0.0% user,?? 0.0% nice, 62.8% sys, 10.9% spin, 8.5% intr,
> 17.8% idle
> CPU03 states:?? 0.0% user,?? 0.0% nice, 54.7% sys,?? 9.1% spin, 10.1% intr,
> 26.0% idle
> CPU04 states:?? 0.0% user,?? 0.0% nice, 62.7% sys, 10.2% spin, 9.8% intr,
> 17.4% idle
> CPU05 states:?? 0.0% user,?? 0.0% nice, 51.7% sys,?? 9.1% spin, 7.6% intr,
> 31.6% idle
> CPU06 states:?? 0.2% user,?? 0.0% nice,?? 2.8% sys,?? 0.8% spin, 10.0% intr,
> 86.1% idle
> CPU07 states:?? 0.0% user,?? 0.0% nice,?? 0.0% sys,?? 0.2% spin, 7.2% intr,
> 92.6% idle
> CPU08 states:?? 0.0% user,?? 0.0% nice,?? 0.0% sys,?? 0.0% spin, 8.4% intr,
> 91.6% idle
> CPU09 states:?? 0.0% user,?? 0.0% nice,?? 0.0% sys,?? 0.0% spin, 9.2% intr,
> 90.8% idle
> CPU10 states:?? 0.0% user,?? 0.0% nice,?? 0.0% sys,?? 0.2% spin, 10.8% intr,
> 89.0% idle
> CPU11 states:?? 0.0% user,?? 0.0% nice,?? 0.0% sys,?? 0.2% spin, 9.2% intr,
> 90.6% idle
> CPU12 states:?? 0.0% user,?? 0.0% nice,?? 0.2% sys,?? 0.8% spin, 9.2% intr,
> 89.8% idle
> CPU13 states:?? 0.0% user,?? 0.0% nice,?? 0.0% sys,?? 0.2% spin, 7.2% intr,
> 92.6% idle
> CPU14 states:?? 0.0% user,?? 0.0% nice,?? 0.0% sys,?? 0.8% spin, 9.8% intr,
> 89.4% idle
> CPU15 states:?? 0.0% user,?? 0.0% nice,?? 0.0% sys,?? 0.2% spin, 7.8% intr,
> 92.0% idle
> Memory: Real: 34M/1546M act/tot Free: 122G Cache: 807M Swap: 0K/256M
> 
> I suspect that top shows an average (in a few seconds time window) and
> perhaps one of the cores from CPU01 to CPU04 are skipped (e.g. because it
> was used by the "top" command?), this is why I can see system load on CPU05.
> (There is even some low amount of system load on CPU06.)
> 
> 
> > > *Is there any way to completely delete its entire content?*
> > hrm.
> > 
> > so i just read the code again. "pfctl -F states" goes through the whole
> > state table and unlinks the states from the red-black trees used for
> > packet processing, and then marks them as unlinked so the purge process
> > can immediately claim then as soon as they're scanned. this means that
> > in terms of packet processing the tree is empty. the memory (which is
> > what the state limit applies to) won't be reclaimed until the purge
> > processing takes them.
> > 
> > if you just wait 10 or so seconds after "pfctl -F states" then both the
> > tree and state limits should be back to 0. you can watch pfctl -si,
> > "systat pf", or the pfstate row in "systat pool" to confirm.
> > 
> > you can change the scan interval with "set timeout interval" in pf.conf
> > from 10s. no one fiddles with that though, so i'd put it back between
> > runs to be representative of real world performance.
> 
> I usually wait 10s between the consecutive steps of the binary search of my
> measurements to give the system a chance to relax (trying to ensure that the
> steps are independent measurements). However, the timeout interval of PF was
> set to 1 hour (using "set timeout interval 3600"). You may ask, why?
> 
> To have some well defined performance metrics, and to define repeatable and
> reproducible measurements, we use the following tests:
> - maximum connection establishment

Re: pf state-table-induced instability

2023-08-30 Thread David Gwynne

sing a safely lower rate than determined by the maximum connection
> establishment rate test.)
> 
> And both tests need to repeat multiple times to acquire statistically
> reliable results.
> 
> As for the explanation of the seemingly deteriorating performance of PF, now
> I understand from your explanation that the "pfctl -F states" command does
> not delete the content of the connection tracking table.
> 
> *Is there any way to completely delete its entire content?*

hrm.

so i just read the code again. "pfctl -F states" goes through the whole
state table and unlinks the states from the red-black trees used for
packet processing, and then marks them as unlinked so the purge process
can immediately claim then as soon as they're scanned. this means that
in terms of packet processing the tree is empty. the memory (which is
what the state limit applies to) won't be reclaimed until the purge
processing takes them.

if you just wait 10 or so seconds after "pfctl -F states" then both the
tree and state limits should be back to 0. you can watch pfctl -si,
"systat pf", or the pfstate row in "systat pool" to confirm.

you can change the scan interval with "set timeout interval" in pf.conf
from 10s. no one fiddles with that though, so i'd put it back between
runs to be representative of real world performance.

> (E.g., under Linux, I can delete the connection tracking table of iptables
> or Jool by deleting the appropriate kernel module.)

i can look at making pfctl -F states free the memory up too, but i have
this massive todo list already :(

> Of course, I can delete it by rebooting the server. However, currently I use
> a Dell PowerEdge R730 server, and its complete reboot (including stopping
> OpenBSD, initialization of the hardware, booting OpenBSD and some spare
> time) takes 5 minutes. This is a way too long overhead, if I need to do it
> between every single elementary steps (that is, the steps of the binary
> search) which are in the order of magnitude of 1 minute. :-(

5 minules of VALUE ADDING. pretty sure dell thinks you should be
grateful for all the amazing work they're doing before you get to the
boot loader.

> 
> (Currently I use the compromise that I reboot the OpenBSD server after
> finishing each binary search.)
> 
> Thank you very much for all your further advice in advance!
> 
> Best regards,
> 
> G??bor
> 
> On 8/29/2023 12:01 AM, David Gwynne wrote:
> > On Mon, Aug 28, 2023 at 01:46:32PM +0200, Gabor LENCSE wrote:
> > > Hi Lyndon,
> > > 
> > > Sorry for my late reply. Please see my answers inline.
> > > 
> > > On 8/24/2023 11:13 PM, Lyndon Nerenberg (VE7TFX/VE6BBM) wrote:
> > > > Gabor LENCSE writes:
> > > > 
> > > > > If you are interested, you can find the results in Tables 18 - 20 of
> > > > > this (open access) paper:https://doi.org/10.1016/j.comcom.2023.08.009
> > > > Thanks for the pointer -- that's a very interesting paper.
> > > > 
> > > > After giving it a quick read through, one thing immediately jumps
> > > > out.  The paper mentions (section A.4) a boost in performance after
> > > > increasing the state table size limit.  Not having looked at the
> > > > relevant code, so I'm guessing here, but this is a classic indicator
> > > > of a hashing algorithm falling apart when the table gets close to
> > > > full.  Could it be that simple?  I need to go digging into the pf
> > > > code for a closer look.
> > > Beware, I wrote it about iptables and not PF!
> > > 
> > > As for iptables, it is really so simple. I have done a deeper analysis of
> > > iptables performance as the function of its hash table size. It is
> > > documented in another (open access) paper:
> > > http://doi.org/10.36244/ICJ.2023.1.6
> > > 
> > > However, I am not familiar with the internals of the other two tested
> > > stateful NAT64 implementations, Jool and OpenBSD PF. I have no idea, what
> > > kind of data structures they use for storing the connections.
> > openbsd uses a red-black tree to look up states. packets are parsed into
> > a key that looks up states by address family, ips, ipproto, ports, etc,
> > to find the relevant state. if a state isnt found, it falls through to
> > ruleset evaluation, which is notionally a linked list, but has been
> > optimised.
> > 
> > > > You also describe how the performance degrades over time.  This
> > > > exactly matches the behaviour we see.  Could the fix be as simple
> > > > as cranking 'set limit states' up to, say, two milltion?  There is
> > > > o

Re: pf state-table-induced instability

2023-08-28 Thread David Gwynne

On Mon, Aug 28, 2023 at 01:46:32PM +0200, Gabor LENCSE wrote:
> Hi Lyndon,
> 
> Sorry for my late reply. Please see my answers inline.
> 
> On 8/24/2023 11:13 PM, Lyndon Nerenberg (VE7TFX/VE6BBM) wrote:
> > Gabor LENCSE writes:
> > 
> > > If you are interested, you can find the results in Tables 18 - 20 of
> > > this (open access) paper: https://doi.org/10.1016/j.comcom.2023.08.009
> > Thanks for the pointer -- that's a very interesting paper.
> > 
> > After giving it a quick read through, one thing immediately jumps
> > out.  The paper mentions (section A.4) a boost in performance after
> > increasing the state table size limit.  Not having looked at the
> > relevant code, so I'm guessing here, but this is a classic indicator
> > of a hashing algorithm falling apart when the table gets close to
> > full.  Could it be that simple?  I need to go digging into the pf
> > code for a closer look.
> 
> Beware, I wrote it about iptables and not PF!
> 
> As for iptables, it is really so simple. I have done a deeper analysis of
> iptables performance as the function of its hash table size. It is
> documented in another (open access) paper:
> http://doi.org/10.36244/ICJ.2023.1.6
> 
> However, I am not familiar with the internals of the other two tested
> stateful NAT64 implementations, Jool and OpenBSD PF. I have no idea, what
> kind of data structures they use for storing the connections.

openbsd uses a red-black tree to look up states. packets are parsed into
a key that looks up states by address family, ips, ipproto, ports, etc,
to find the relevant state. if a state isnt found, it falls through to
ruleset evaluation, which is notionally a linked list, but has been
optimised.

> > You also describe how the performance degrades over time.  This
> > exactly matches the behaviour we see.  Could the fix be as simple
> > as cranking 'set limit states' up to, say, two milltion?  There is
> > one way to find out ... :-)
> 
> As you could see, the highest number of connections was 40M, and the limit
> of the states was set to 1000M. It worked well for me then with the PF of
> OpenBSD 7.1.
> 
> It would be interesting to find the root cause of the phenomenon, why the
> performance of PF seems to deteriorate with time. E.g., somehow the internal
> data structures of PF become "polluted" if many connections are established
> and then deleted?

my first guess is that you're starting to fight agains the pf state
purge processing. pf tries to scan the entire state table every 10
seconds (by default) looking for expired states it can remove. this scan
process runs every second, but it tries to cover the whole state table
by 10 seconds. the more states you have the more time this takes, and
this increases linearly with the number of states you have.

until relatively recently (post 7.2), the scan and gc processing
effectively stopped the world. at work we run with about 2 million
states during business hours, and i was seeing the gc processing take up
approx 70ms a second, during which packet processing didnt really
happen.

now the scan can happen without blocking pf packet processing. it still
takes cpu time, so there is a point that processing packets and scanning
for states will fight each other for time, but at least they're not
fighting each other for locks now.

> However, I have deleted the content of the state table after each elementary
> measurement step using the "pfctl -F states" command. (I am sorry, this
> command is missing from the paper, but it is there in my saved "del-pf"
> file!)
> 
> Perhaps PF developers could advise us, if the deletion of the states
> generate a fresh state table or not.

it marks the states as expired, and then the purge scan is able to take
them and actually free them.

> Could anyone help us in this question?
> 
> Best regards,
> 
> G??bor
> 
> 
> 
> 
> I use binary search to find the highest lossless rate (throughput).
> Especially w
> 
> 
> > 
> > --lyndon
>

Re: ipsec hardware recommendation

2023-08-11 Thread David Gwynne




> On 11 Aug 2023, at 21:08, Marko Cupać  wrote:
> 
> Hi,
> 
> I have star topology network where dozens of spokes communicate with
> other spokes through central hub over GRE tunnels protected with
> transport-mode ipsec.
> 
> This worked great for years, but lately all the locations got bandwidth
> upgrade (spokes: 10Mbit -> 50Mbit, hub: 2x200Mbit -> 2x500Mbit), and I'm
> starting to experience problems.
> 
> Spokes have APU4D4s, and my tests show they can push up to 30Mbit/s of
> ipsec bidirectionally. Hub has HPE DL360g9 with Xeon CPU E5-2623 v4 @
> 2.60GHz and bge NICs, and it seems it can push no more than 200Mbit/s
> of ipsec bidirectionally (I have no chance to test this thoroughly in a
> lab, but what I see in production indicate this strongly).
> 
> Are there any commands I can run which would indicate ipsec traffic is
> being throttled due to hardware being underspecced? top shows CPU is
> more than 50% idle. netstat shows ~1 Ierrs / Ifail (no Oerrs /
> Ifail) on interfaces that deal with ipsec for two months worth of
> uptime.
> 
> Would replacing Xeon box with AMD EPYC 7262 likely result in an
> improvement? Should I go for some NICs other than bge? What hardware do
> I need at Hub location to accomodate ~400Mbit/s of ipsec
> bidirectionally?

>From recent experience it looks like IPsec, and the crypto processing in 
>particular, still runs under the giant kernel lock. This means you're only 
>going to go as fast as a single core can go, and you'll be particularly 
>sensitive to contention on that lock. The things you can do Right Now(tm) are:

- upgrade to a system with the fastest single core performance you can afford

- upgrade to -current

the pf purge code has been taken out from under the big kernel lock. if you 
have a lot of pf states, this will give more time to crypto.

- pick faster crypto algorithms

you might already be using the fastest, so maybe this wont help.

- terminate ipsec on multiple hosts

two kernels will be faster than one. however, this adds complexity to the 
network, so not an obvious benefit.

- try wireguard?

if it's a single tunnel IP tunnel (ie, one gre(4), and not egre(4)) between the 
hubs and spokes then wg might be simpler and faster. simpler because wg is less 
layers than gre over ipsec, and faster cos it should be able to do crypto in 
parallel.


in the future i'm sure the ipsec stack will improve, but it's hard work that 
takes time.

dlg

> 
> Thank you in advance,
> 
> 
> -- 
> Before enlightenment - chop wood, draw water.
> After  enlightenment - chop wood, draw water.
> 
> Marko Cupać
> https://www.mimar.rs/
>

Re: veb and vport on apu2 -- config feedback

2023-06-23 Thread David Gwynne

looks good to me after a quick read.

> On 23 Jun 2023, at 12:15, Amarendra Godbole  
> wrote:
> 
> I am planning to experiment with veb on my PC Engines apu2e4 board. It
> has three ports (em0, 1 and 2). Current configuration has em0 hooked
> up to cable modem, while em1 and em2 are internal LAN. I don't have a
> good ability to troubleshoot via a serial console, since the apu board
> sits in the garage on top of a cabinet -- running serial cable to a
> laptop is challenging, though not impossible. So I am looking for
> feedback so as to keep this troubleshooting time minimal.
> 
> Any feedback is welcome. Configs below. Thanks in avance.
> 
> -Amarendra
> 
> $ cat hostname.em1
> mtu 9000
> up
> 
> $ cat hostname.em2
> mtu 9000
> up
> 
> $ cat hostname.veb0
> add em1
> add em2
> add vport0
> link0
> up
> 
> $ cat hostname.vport0
> inet 192.168.1.1 255.255.255.0 192.168.1.255
> mtu 9000
> group internal
> up
> 
> $ cat pf.conf
> ruckus= "192.168.1.10"
> 
> table  { 0.0.0.0/8 10.0.0.0/8 127.0.0.0/8 169.254.0.0/16 \
>   172.16.0.0/12 192.0.0.0/24 192.0.2.0/24 224.0.0.0/3 \
>   192.168.0.0/16 198.18.0.0/15 198.51.100.0/24\
>   203.0.113.0/24 }
> 
> set block-policy drop
> set loginterface egress
> set skip on lo0
> match in all scrub (no-df random-id max-mss 1440)
> 
> # spoof protection
> antispoof quick for egress
> block in from no-route
> block in quick from urpf-failed
> 
> # block martians!
> block in quick on egress from  to any
> block return out quick on egress from any to 
> 
> # default deny
> block all
> 
> # allow icmp
> match in on egress inet proto icmp icmp-type { echoreq } tag ICMP_IN
> block drop in on egress proto icmp
> pass in proto icmp tagged ICMP_IN max-pkt-rate 100/10
> pass in on egress inet proto icmp icmp-type { 3 code 4, 11 code 0}
> 
> pass out quick on egress inet from internal nat-to (egress)
> pass out quick inet
> pass in on internal inet
> 
> # block dns queries that are not destined for our dns server.
> block return in quick on internal proto { udp tcp } to ! internal port
> { 53 853 }
> 
> # block Ruckus AP from "phoning home"
> block in quick on internal from $ruckus
>

Re: Using pf route-to to Route Network Traffic a tun interface and Replying from it

2023-06-05 Thread David Gwynne

On Tue, May 30, 2023 at 06:07:32PM +0300, Nick Andersen wrote:
> Hi Folks,

hi.

> 
> I am writing to seek assistance regarding an issue I am experiencing in
> trying to route my Personal Computer's network traffic to a TUN interface.
> My objective is to modify some of its content and subsequently return the
> traffic back.
> 
> So far, I have successfully created a TUN interface using the following
> configuration:
> 
> andersen@pc% ifconfig tun8 inet 172.16.122.1/32 172.16.122.2 up
> andersen@pc% ifconfig tun8
> tun8: flags=8051 mtu 1500
> inet 172.16.122.1 --> 172.16.122.2 netmask 0x
> 
> 
> Subsequently, I have also inspected the primary Ethernet interface, em0, as
> follows:
> 
> 
> andersen@pc % ifconfig em0
> em0: flags=8863 mtu 1500
> options=6463
> ether xx:xx:xx:xx:xx:xx
> inet 192.168.1.128 netmask 0xff00 broadcast 192.168.1.255
> nd6 options=201
> media: autoselect
> status: active
> 
> 
> 
> And I've updated pf.conf;
> 
> set skip on { lo0 tun8 }
> 
> ext_if="em0"
> tun_if="tun8"
> 
> # allow dns
> pass in log quick on $ext_if inet proto { tcp udp } from any to any port 53
> pass out log quick on $ext_if  inet proto { tcp udp } from any to any port
> 53
> 
> pass in log quick on $ext_if
> pass out log quick on $ext_if route-to (tun8 (tun8)) no state

the syntax and semantics for route-to changed before 6.9. are you
running a stable release (ie, 7.2 or preferably 7.3)?

the pf.conf syntax changed so that instead of routing to an interface
with an optional IP address, you route-to a destination IP address. the
semantic change is that route-to relies on states. so you probably want

  pass out log quick on $ext_if route-to 172.16.122.2

because pfctl will resolve interface names to ips, you can also use
this:

  pass out log quick on $ext_if route-to tun8:peer

> pass out log quick on $tun_if reply-to (em0 (em0))

you have "set skip on tun8" above, which means this rule won't run.

however, you have a problem where you don't want to route-to to
happen to the packets that are being reinjected by your program. i think
the least worst way to do that in this situation is to use the
following:

  pass out log quick on $ext_if received-on $tun_if
  pass out log quick on $ext_if route-to $tun_if:peer

if you want your program to handle packets in both directions on a
connection, you could have rules like this:

  pass out log quick on $ext_if reply-to $tun_if:peer received-on $tun_if
  pass out log quick on $ext_if route-to $tun_if:peer

you wont be able to tell the direction of the packets apart if they
all go through the one tun interface though. if you route-to tun8
and reply-to another interface (eg, tun9), then you will be able
to differentiate them based on which tun interface you read them
from.

divert(4) sockets might also work for you depending on what you're
doing. if you're just monitoring packets then there's also dup-to
and bpf/tcpdump.

> --
> 
> I implemented a small C program that reads packets from /dev/tun8 and
> writes them back to the same device. During the writing phase, I have
> attempted to add a 4-byte TUN header (with AF_INET byte). The issue arises
> when I enable pf, as my connectivity ceases to function. I suspect that the
> problem may be linked to the reply-to rule. I can accurately read all
> network packets, but my network connectivity is disrupted when I activate
> pf.
> 
> Are there any thoughts about what I'm doing wrong?

id leave pf enabled and just change rules.

> 
> Thanks!
> 
> Here is a sample from pflog;
> 
> andersen@pc% sudo tcpdump -nettti pflog0
> 
> tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
> 
> listening on pflog0, link-type PFLOG (OpenBSD pflog file), capture size 246
> bytes
> 
>  00:00:00.00 rule 6/0(match): pass out on em0: 192.168.1.128.52553 >
> 17.248.173.70.443: Flags [S], seq 1289016582, win 65535, options [mss
> 1460,nop,wscale 6,nop,nop,TS val 1617830816 ecr 0,sackOK,eol], length 0
> 
>  00:00:00.005332 rule 6/0(match): pass out on em0: 192.168.1.128.52569 >
> 17.248.172.107.443: Flags [S], seq 1886843796, win 65535, options [mss
> 1460,nop,wscale 6,nop,nop,TS val 386220006 ecr 0,sackOK,eol], length 0
> 
>  00:00:00.178005 rule 6/0(match): pass out on em0: 192.168.1.128.52554 >
> 17.248.172.208.443: Flags [S], seq 3787270145, win 65535, options [mss
> 1460,nop,wscale 6,nop,nop,TS val 1898437799 ecr 0,sackOK,eol], length 0
> 
>  00:00:00.079092 rule 6/0(match): pass out on em0: 192.168.1.128.52570 >
> 17.248.173.83.443: Flags [S], seq 606598735, win 65535, options [mss
> 1460,nop,wscale 6,nop,nop,TS val 2940552698 ecr 0,sackOK,eol], length 0
> 
>  00:00:00.174093 rule 6/0(match): pass out on em0: 192.168.1.128.52555 >
> 17.248.172.172.443: Flags [S], seq 1449413825, win 65535, options [mss
> 1460,nop,wscale 6,nop,nop,TS val 212268682 ecr 0,sackOK,eol], length 0
> 
>  00:00:00.079048 rule 6/0(match): pass out on em0: 192.168.1.128.52571 >
> 17.248.172.135.443: Flags [S], seq 1322915507, win

Re: Route based IPsec

2023-05-31 Thread David Gwynne




> On 31 May 2023, at 18:33, Claudio Jeker  wrote:
> 
> On Wed, May 31, 2023 at 08:35:45AM +1000, David Gwynne wrote:
>> 
>> 
>>> On 27 May 2023, at 21:40, Stuart Henderson  
>>> wrote:
>>> 
>>> On 2023-05-27, Valdrin MUJA  wrote:
>>>>   Does OpenBSD have routed based IPsec support?
>>> 
>>> Not yet.
>> 
>> while you wait, it might be possible to configure a gif tunnel protected
>> by ipsec transport mode.
>> 
> 
> The annoying bit with gif tunnels in transport mode is the need for static
> IPs on both sides of the tunnel. I ended up tunneling gif in tunnel mode
> because of that.

that's an annoying thing about gif, even without ipsec in the mix.

should i make it possible to specify an interface as the source of local 
addresses on tunnels?

Re: Route based IPsec

2023-05-30 Thread David Gwynne




> On 27 May 2023, at 21:40, Stuart Henderson  wrote:
> 
> On 2023-05-27, Valdrin MUJA  wrote:
>>Does OpenBSD have routed based IPsec support?
> 
> Not yet.

while you wait, it might be possible to configure a gif tunnel protected by 
ipsec transport mode.

dlg

Re: Usage of pf(4) with tap(4) and veb(4)

2023-05-26 Thread David Gwynne

On Thu, May 25, 2023 at 02:11:29AM +0200, Joel Carnat wrote:
> Hi,
> 
> I'd like confirm I understood how pf works in a mixed veb/vport/tap
> environment. I'm using OpenBSD 7.3/amd64 (if that matters).
> 
> I have a physical host that runs services (relayd, httpd...) the "classical"
> way and also provides VM using vmd. I have a couple of public IPs that are
> either affected to the host (via vportN) or to some VMs (via tapN). I'm
> doing all the IP filtering on the host's pf (because some VMs are Linux and
> I don't know iptables).
> 
> Here's a sum'up of my configuration:
>   # cat /etc/hostname.em0
>   up
>   # cat /etc/hostname.vport0
>   rdomain 0
>   inet aa.bb.cc.5 255.255.255.0
>   !route -n add -inet default aa.bb.cc.1
>   up
>   # cat /etc/hostname.vport1
>   rdomain 1
>   inet aa.bb.cc.6 255.255.255.0
>   !route -T 1 -n add -inet default aa.bb.cc.1
>   up
>   # cat /etc/hostname.tap2
>   rdomain 2
>   up
>   # cat /etc/hostname.veb0
>   add em0
>   add vport0
>   add vport1
>   add tap2
>   up
>   # cat /etc/vm.conf
>   (...)
>   switch "wan"   { interface veb0 }
>   (...)
>   vm linux {
>   (...)
> interface tap2 {
>   rdomain 2
>   switch "wan"
>   # configure enp0s2 with aa.bb.cc.7/24
> }
>   (...)
> 
> My initial pf configuration looked like:
>   block return log
>   pass on lo
>   pass in on vport0 proto tcp to vport0 port ssh
>   pass in on vport1 proto tcp to vport1 port { http, https }
>   pass in on tap2   proto tcp to aa.bb.cc.7 port ssh
>   pass out
> 
> This filters properly on vport0 and vport1. But nothing is filtered on tap2:
> the http service running in the VM is accessible via aa.bb.cc.7.
> 
> First question: is it expected that pf doesn't filter inbound traffic on a
> tap interface by default? Or is it specific to the fact that tap2 belongs to
> veb0?

it's because tap is part of a veb, as per the first part of the veb
manpage.

> After re-reading veb(4), I ran `ifconfig veb0 link1` and could achieve the
> wished filtering by modifying my pf configuration as such:
>   block return log
>   pass on lo
>   pass on em0
>   pass in  on vport0 proto tcp to vport0 port ssh
>   pass in  on vport1 proto tcp to vport1 port { http, https }
>   pass out on tap2   proto tcp to aa.bb.cc.7 port ssh
>   pass out on vport0
>   pass out on vport1
>   pass in  on tap2
> 
> Second question: is this the proper way to configure veb0 and pf or is there
> a "better" way of doing the filtering?

no, that's what link1 is for.

just note the following:

- you want to avoid pf matching a packet against the same state multiple
times. this means you want to avoid pf running a packet in the
same direction in the same rdomain.

- pf running on ports in a veb (except vports) uses the rdomain from the
veb interface, it ignores the rdomain on ports (except vport).

- if veb can't find a single outgoing interface for a packet, it will run it
through pf on veb. ie, broadcast, multicast, and unknown unicast can be
matched in pf with "pass out on veb0" because those packets will be
flooded to all ports.

so avoid having vports and vebs in the same rdomain is my advice.

Re: small issue with mpe

2023-05-23 Thread David Gwynne

> On 23 May 2023, at 17:40, Claudio Jeker  wrote:
> 
> On Tue, May 23, 2023 at 07:09:51AM -, Stuart Henderson wrote:
>> On 2023-05-23, David Gwynne  wrote:
>>> On Sat, May 20, 2023 at 09:44:51AM +0200, Holger Glaess wrote:
>>>> hi
>>>> 
>>>> 
>>>> looks like that the patch works , but should not print "tunneldomain"
>>>> instead of "rdomain" ?
>>> 
>>> that's an interesting question.
>>> 
>>> ifconfig does not aim to produce output that can then be used as input
>>> for ifconfig again. printing it as rdomain is at least consistent with
>>> how it's printed on the tunnel: line for things like etherip and gif,
>>> and i guess the assumption is you can figure out that it's tunneldomain
>>> from the context.
>> 
>> things are a bit inconsistent here - doesn't this actually take an rtable
>> not an rdomain? (wg uses and prints "wgrtable" for what I think is the
>> equivalent thing).
> 
> I think this is a general issue with tunneldomain. It should be
> tunneltable since it used for two things. The route lookup of the tunnel
> endpoints and to alter the mbufs rdomain on encapsulation / decapsulation. 
> At least in theory this is how it should work but someone needs to verify
> that all drivers really behave like this.

ifconfig drv0 rdomain specifies which rdomain and send packets into the 
interface, and which rdomain the packets coming out of the interface will use. 
this is the same on all interfaces whether they're tunnels or not.

ifconfig drv0 tunneldomain specifies the rdomain that the encapsulated packets 
operate in.

rdomain and tunneldomain (if supported) are always in effect and in the same 
way. packets sent from an rdomain out a tunnel will get the tunnel headers 
added to the packet and the rdomain rewritten to the tunneldomain value (which 
could be 0). encapsulated packets from the remote tunnel endpoint have to match 
the tunneldomain before the tunnel interface will match them and decapsulate 
them, and once they're decapsulated the rdomain on the packet is set to the 
interface rdomain value.

dlg

> 
> -- 
> :wq Claudio
>

Re: small issue with mpe

2023-05-22 Thread David Gwynne

On Sat, May 20, 2023 at 09:44:51AM +0200, Holger Glaess wrote:
> hi
> 
> 
> looks like that the patch works , but should not print "tunneldomain"
> instead of "rdomain" ?

that's an interesting question.

ifconfig does not aim to produce output that can then be used as input
for ifconfig again. printing it as rdomain is at least consistent with
how it's printed on the tunnel: line for things like etherip and gif,
and i guess the assumption is you can figure out that it's tunneldomain
from the context.

> 
> Holger
> 
> 
> /usr/src/sbin/ifconfig 164>ifconfig mpe1
> mpe1: flags=51 rdomain 200 mtu 1500
>  ??index 82 priority 0 llprio 3
>  ??encap: txprio 0 rxprio packet
>  ??mpls: label 200
>  ??groups: mpe
>  ??inet 172.16.2.201 --> 0.0.0.0 netmask 0x
> 09:42:35 Sat May 20
> you are on farin as root
> /usr/src/sbin/ifconfig 165>./ifconfig mpe1
> mpe1: flags=51 rdomain 200 mtu 1500
>  ??index 82 priority 0 llprio 3
>  ??encap: txprio 0 rxprio packet
>  ??mpls: label 200 rdomain 20
>  ??groups: mpe
>  ??inet 172.16.2.201 --> 0.0.0.0 netmask 0x
> 09:42:39 Sat May 20
> you are on farin as root
> 
> 
> On 20.05.23 02:22, David Gwynne wrote:
> > +   if (ioctl(sock, SIOCGLIFPHYRTABLE, (caddr_t)) == 0 &&
> > +   (rdomainid != 0 || ifr.ifr_rdomainid != 0))
> > +   printf(" rdomain %d", ifr.ifr_rdomainid);
> > +

Re: small issue with mpe

2023-05-19 Thread David Gwynne

On Fri, May 19, 2023 at 04:44:38PM +0200, Holger Glaess wrote:
> hi
> 
> 
> if you do an "ifconfig mpeX" , will not show the configured tunneldomain.
> 
> /etc 59>ifconfig mpe1
> mpe1: flags=51 rdomain 200 mtu 1500
>  ??index 82 priority 0 llprio 3
>  ??encap: txprio 0 rxprio packet
>  ??mpls: label 200
>  ??groups: mpe
>  ??inet 172.16.2.201 --> 0.0.0.0 netmask 0x
> 
> /etc 60>cat hostname.mpe1
> rdomain 200
> inet 172.16.2.201/32
> -inet6
> mplslabel 200
> tunneldomain 20
> up
> 
> tunneldomain option works

looks like ifconfig only tries to display the tunneldomain if the
interface also provides "tunnel" addresses, which mpe does not do
because it uses labels instead.

this has the mpls line in ifconfig try and print the rdomain too. can you
try it?

Index: ifconfig.c
===
RCS file: /cvs/src/sbin/ifconfig/ifconfig.c,v
retrieving revision 1.462
diff -u -p -r1.462 ifconfig.c
--- ifconfig.c  8 Mar 2023 04:43:06 -   1.462
+++ ifconfig.c  20 May 2023 00:17:43 -
@@ -3982,6 +3985,10 @@ mpls_status(void)
} else
printf("\tmpls: label %u", shim.shim_label);
 
+   if (ioctl(sock, SIOCGLIFPHYRTABLE, (caddr_t)) == 0 &&
+   (rdomainid != 0 || ifr.ifr_rdomainid != 0))
+   printf(" rdomain %d", ifr.ifr_rdomainid);
+
pwe3_neighbor();
pwe3_cword();
pwe3_fword();

Re: Will tags length influence the performance in PF?

2023-04-21 Thread David Gwynne

inside the kernel tags are given numeric identifiers, and these numbers are 
used everywhere. the length of the tag name doesnt affect performance.

> On 21 Apr 2023, at 04:10, Cristian Danila  wrote:
> 
> Hello Misc,
> 
> I have a technical question in regards to PF tags.
> I was always wondering if the length of tags matters
> or not in terms of performance.
> For example will PF use the same effort to match a tag
> TEST_TEST_TEST_TEST_TEST as it would do for a tag A?
> I am wondering if PF internally would just translate initially all
> tags in a set of optimized id's and later will use only those id's
> when tag filtering is used.
> 
> I appreciate your answer.
> With respect,
> Claudiu
>

Re: veb Interface Max Cache Size Restrict

2023-04-18 Thread David Gwynne

On Tue, Apr 18, 2023 at 07:51:08PM +, Samuel Jayden wrote:
> Hello,
> I have one veb interface in OpenBSD 7.2 and 5 ethernet ports are paired
> with this veb. As I understand from the ifconfig output, 4096 mac address
> cache values can be kept in this veb interface .
> 
> ifconfig veb10
> veb10: flags=8843
> index 12 llprio 3
> groups: veb
> em3 flags=3
> port 4 ifpriority 0 ifcost 0
> em0 flags=3
> port 1 ifpriority 0 ifcost 0
> em1 flags=3
> port 2 ifpriority 0 ifcost 0
> ix3 flags=3
> port 8 ifpriority 0 ifcost 0
> ix2 flags=3
> port 7 ifpriority 0 ifcost 0
> Addresses (max cache: 4096, timeout: 240):
> 2c:f0:5d:73:f8:c4 em1 0 flags=0<>
> 
> 
> When I tried to extend this limit value with the command "ifconfig veb10
> maxaddr 4097", I got the following error message:
> "ifconfig: veb10: Invalid argument"
> The maximum value I can give without this error message is 4096. Isn't this
> value a bit narrow?

maybe. it seemed pretty high when i made it up.

> I have tested that the mac addresses of the connected devices are not
> recorded in the veb interface after exceeding the limit.
> 
> I want to switch from Cisco device to OpenBSD in a place where there are
> more than 8 thousand MAC addresses, but I need to exceed this max cache
> size value.
> How can I increase this max cache size value 8192 or higher value?

you change 4096 to a bigger number in the code.

Index: if_etherbridge.c
===
RCS file: /cvs/src/sys/net/if_etherbridge.c,v
retrieving revision 1.7
diff -u -p -r1.7 if_etherbridge.c
--- if_etherbridge.c5 Jul 2021 04:17:41 -   1.7
+++ if_etherbridge.c19 Apr 2023 02:25:54 -
@@ -675,7 +676,7 @@ int
 etherbridge_set_max(struct etherbridge *eb, struct ifbrparam *bparam)
 {
if (bparam->ifbrp_csize < 1 ||
-   bparam->ifbrp_csize > 4096) /* XXX */
+   bparam->ifbrp_csize > 16384) /* XXX */
return (EINVAL);
 
/* commit */

Re: Using veb instead of bridge at vpls section

2023-03-28 Thread David Gwynne




> On 21 Mar 2023, at 05:05, Valdrin MUJA  wrote:
> 
> Hello folks,
> 
> I have successfully configured the VPLS by following the instruction on 
> https://pawa.lt/posts/2018/01/vpls-with-openbsd/.
> Everything worked like a charm.
> 
> But when I tried to use veb(4)  instead of bridge(4) , I got 'Device Busy' 
> error.

Can you give some context about where you go this error?

> I'm guessing ldpd(8) doesn't support the veb interface. Is it true?
> I'm just trying to be sure. If this is the case, I hope one day ldpd(8) will 
> get veb(4) support. Thanks for these great efforts.
>

Re: How to use VM as router to other VMs or Host?

2023-03-13 Thread David Gwynne

On Sat, Mar 11, 2023 at 11:30:52AM +0100, lisper.drea...@tutanota.com wrote:
> Hi Misc,
> I'm trying to use alpine linux as a router/gateway to my OpneBSD machine.
> I can set up alpine linux with vmm and configure its network, no problem so 
> far.
> I'd like my host network traffic to get in and out through my alpine vm.
> The idea is to use alpine as a vpn to my host browser.
> I've been able to get PIA wireguard working on alpine vm, and I would like to 
> redirect my host browser through it.
> 
> Any reference anywhere?

this is a surprising request. i spend an annoying amount of time trying
to get other operating systems to do network things that i feel are easy
in openbsd rather than the other way round. with that in mind, openbsd
does support wireguard, so if PIA is using vanilla wg then it should be
possible to get openbsd to do the vpn bit for you.

however, it should be possible to do what you want. the high level idea
is to give your alpine VM a pair of network interfaces, one for the vm
to connect to PIA with, and one for your browser to talk to. the
external (PIA facing) interface needs to be connected to the outside
world like you'd connect any other vm to the net. setting it up on a
switch (veb) with your physical interface for example.

the other interface will also appear as a tap interface, but if you're
only using the vpn it provides for a browser, then your browser can talk
to the tap interface directly without needing another switch or anythign
like that. just configure an ip on the tap interface on the same subnet
as the vm.

once the vm is set up and connected, then you need to get your browser
to route to the vm. if browsing is the only thing you do on this
machine, then setting your default route to the ip address of the vm
over the tap interface will work.

if you want only the browser to use the vpn, then you can either put the
tap interface into a separate rdomain and run the browser in that
rdomain. you'll have to be careful about other traffic the browser
generates for this to work, in particular dns traffic cos the browser
will be using your /etc/resolv.conf to find nameservers. otherwise, you
can use route-to in pf, but you'll need some way for pf to identify the
browser traffic (pass out to port { http https } route-to $alpine_vm
maybe).

doing the vpn on openbsd rather than in the vm sounds less complicated
to me. good luck.

dlg

Re: athn on a bridge

2023-02-09 Thread David Gwynne

On Thu, Feb 09, 2023 at 11:44:56AM -, Stuart Henderson wrote:
> On 2023-02-08, Martin Kj??r J??rgensen  wrote:
> >
> > When configuring the athn0 with no IP address, and adding the interface to a
> > bridge0 interface along with the em1 device and a vether0 device, clients
> > still connects fine to athn0 SSID but when clients ask for IP over DHCP,
> > ethernet frames does not propagate to vether0 where the dhcpd listens.
> 
> Don't expect great performance, but athn hostap ought to work.
> 
> Likely you will have better luck replacing the bridge interface with
> veb(4) and vether with vport(4), e.g.
> 
> hostname.veb0:
> add vport0 add athn0 add em1
> up
> 
> hostname.vport0:
> inet 192.168.1.1/25
> up

i was just going to suggest this...

for the benefit of the mailing list, bridge(4) effectively turns every
interface you add to a bridge into two ports: one that faces the IP
stacks and another that faces the physical connection it has. if this
sounds confusing, then yes, it is.

with veb and vport, only vport interfaces provide a connection between
the ip stack and the layer2 network built out veb and physical interfaces.

if you do try this with veb/vport, you should be able to tcpdump on
athn0 and see that dhcp requests come in, then tcpdump on veb should
show them crossing the bridge, and then tcpdump on vport0 should show
them going toward the network stack and dhcpd.

also, you're bridge(4) interface isn't UP in the output below, so it's
not going to carry packets from athn0 to vether0.

> 
> > vether0: flags=8943 mtu 1500
> > lladdr fe:e1:ba:d0:cd:4a
> > index 9 priority 0 llprio 3
> > groups: vether
> > media: Ethernet autoselect
> > status: active
> > inet 192.168.1.1 netmask 0xff80 broadcast 192.168.1.127
> > athn0: flags=8943 mtu 1500
> > lladdr 00:26:82:61:87:c9
> > index 5 priority 4 llprio 3
> > groups: wlan
> > media: IEEE802.11 autoselect mode 11g hostap
> > status: active
> > ieee80211: nwid TEST chan 2 bssid 00:26:82:61:87:c9 -58dBm wpakey 
> > wpaprotos wpa2 wpaakms psk wpaciphers ccmp wpagroupcipher ccmp
> > bridge0: flags=0<> mtu 1500
> > index 8 llprio 3
> > groups: bridge
> > priority 32768 hellotime 2 fwddelay 15 maxage 20 holdcnt 6 proto 
> > rstp
> > designated: id 00:00:00:00:00:00 priority 0
> > athn0 flags=3
> > port 5 ifpriority 0 ifcost 0
> > em1 flags=3
> > port 2 ifpriority 0 ifcost 0
> > vether0 flags=3
> > port 9 ifpriority 0 ifcost 0
> > Addresses (max cache: 100, timeout: 240):
>

Re: OpenBSD as a transparent switch filter

2023-01-24 Thread David Gwynne




> On 25 Jan 2023, at 10:03, Martin Schröder  wrote:
> 
> Am Mi., 25. Jan. 2023 um 00:45 Uhr schrieb David Gwynne :
>> I think you can do this on OpenBSD with https://github.com/eait-itig/commarp 
>> and just routing on em0. I don’t think any layer 2 things like bridge or veb 
>> are needed, and probably won’t work anyway because as Claudio said, they 
>> don’t want to hairpin anyway.
> 
> But arp only works for vintage-ip.

You mean IP-classic? I’d argue it should be less than the majority of traffic 
on the Internet before we call it vintage.

The principle could be applied to v6 as well.

Re: OpenBSD as a transparent switch filter

2023-01-24 Thread David Gwynne




> On 25 Jan 2023, at 09:47, Tom Smyth  wrote:
> 
> Hi David is that like a local proxy arp type setup (on typical
> networking gear) .. ?

I’ve never had a clear idea about what proxy ARP is, and the only time it comes 
up in converstaion is when people complain about problems it causes. Do you 
have a definition of what you think it means before I say yes or no?

> 
> On Tue, 24 Jan 2023 at 23:45, David Gwynne  wrote:
>> 
>> I think you can do this on OpenBSD with https://github.com/eait-itig/commarp 
>> and just routing on em0. I don’t think any layer 2 things like bridge or veb 
>> are needed, and probably won’t work anyway because as Claudio said, they 
>> don’t want to hairpin anyway.
>> 
>> That code doesn’t have any manpages unfortunately. commarp wants a config 
>> file saying which interface it should run on and which IPs it should 
>> intercept ARP for. eg:
>> 
>> $ cat /etc/commarp.conf
>> interface em0 {
>>allow 192.168.1.16 - 192.168.1.254
>> }
>> 
>> There’s no point rewriting ARP requests for the IP your router is using on 
>> that subnet, or carp addresses on that subnet, etc.
>> 
>> 
>>> On 24 Jan 2023, at 22:16, Cristian Danila  wrote:
>>> 
>>> HI Tom,
>>> 
>>> I am familiar with options you mentioned, veb, bridge and isolated ports.
>>> I am having another transparent filter based of veb also I am aware about
>>> protected members but my use case is different.
>>> 
>>> Let me try to explain maybe with different words.
>>> OpenBSD box is having only one cable input, so what would be the
>>> benefit of having protected members?
>>> Protected members are isolating the communication between members of a
>>> bridge, in my case
>>> I have only one NIC, so if a bridge would be helpful, I can have a
>>> bridge with single member,
>>> therefore isolating that member from who?
>>> OpenBSD box has only one wire connected to a physical switch, so it
>>> can communicate with all members
>>> of the switch, but the physical switch itself do not permit
>>> communication between members as explained.
>>> So it is a desire that OpenBSD box is the one that is making possible
>>> communication between different
>>> members of the switch through same wire.
>>> 
>>> Let me try to draw it, I hope will help more
>>> 
>>> DEVICE1 DEVICE2 DEVICE3
>>>|   |  |
>>>|   |  |
>>> ---
>>> PORT1 PORT2PORT3 PORT 20
>>>   |   |  |_|
>>>   |   |_ |
>>>   |__ |
>>> PHISICAL SWITCH DEVICE  |
>>> ---|
>>>  |
>>>  |
>>>  |
>>>  OPEN BSD BOX
>>> 
>>> 
>>> Thank you.
>>> 
>>> 
>>> On Tue, Jan 24, 2023 at 1:43 PM Tom Smyth  
>>> wrote:
>>>> 
>>>> Hello Cristian,
>>>> if you want to filter on layer 2 ... you would need to use Bridge
>>>> have a look at  man ifconfig(8)
>>>> bridge filter rules can be added to ports in the bridge...
>>>> you can also tag traffic in bridge filter rules and then use PF to
>>>> filter them...
>>>> 
>>>> but if your objective is to isolate ports from each other.. this can
>>>> be achieved with protected port groups...
>>>> again check out ifconfig (8)
>>>> TLDR version bridge ports in the same protected port group are
>>>> isolated from each other...
>>>> If port isolation if all your looking for (no other detailed filtering
>>>> ) if (im not sure) veb(4) supports protected ports...then this would
>>>> be faster...
>>>> but to my shame I have not tried out veb(4)
>>>> 
>>>> I hope this is of some use...
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> On Tue, 24 Jan 2023 at 11:29, Cristian Danila  wrote:
>>>>> 
>>>>> Hello
>>>>> 
>>>>> I have a more difficult task that I would like to solve with OpenBSD
>>>>> and

Re: OpenBSD as a transparent switch filter

2023-01-24 Thread David Gwynne

I think you can do this on OpenBSD with https://github.com/eait-itig/commarp 
and just routing on em0. I don’t think any layer 2 things like bridge or veb 
are needed, and probably won’t work anyway because as Claudio said, they don’t 
want to hairpin anyway.

That code doesn’t have any manpages unfortunately. commarp wants a config file 
saying which interface it should run on and which IPs it should intercept ARP 
for. eg:

$ cat /etc/commarp.conf  
interface em0 {
allow 192.168.1.16 - 192.168.1.254
}

There’s no point rewriting ARP requests for the IP your router is using on that 
subnet, or carp addresses on that subnet, etc.


> On 24 Jan 2023, at 22:16, Cristian Danila  wrote:
> 
> HI Tom,
> 
> I am familiar with options you mentioned, veb, bridge and isolated ports.
> I am having another transparent filter based of veb also I am aware about
> protected members but my use case is different.
> 
> Let me try to explain maybe with different words.
> OpenBSD box is having only one cable input, so what would be the
> benefit of having protected members?
> Protected members are isolating the communication between members of a
> bridge, in my case
> I have only one NIC, so if a bridge would be helpful, I can have a
> bridge with single member,
> therefore isolating that member from who?
> OpenBSD box has only one wire connected to a physical switch, so it
> can communicate with all members
> of the switch, but the physical switch itself do not permit
> communication between members as explained.
> So it is a desire that OpenBSD box is the one that is making possible
> communication between different
> members of the switch through same wire.
> 
> Let me try to draw it, I hope will help more
> 
> DEVICE1 DEVICE2 DEVICE3
> |   |  |
> |   |  |
> ---
> PORT1 PORT2PORT3 PORT 20
>|   |  |_|
>|   |_ |
>|__ |
> PHISICAL SWITCH DEVICE  |
> ---|
>   |
>   |
>   |
>   OPEN BSD BOX
> 
> 
> Thank you.
> 
> 
> On Tue, Jan 24, 2023 at 1:43 PM Tom Smyth  
> wrote:
>> 
>> Hello Cristian,
>> if you want to filter on layer 2 ... you would need to use Bridge
>> have a look at  man ifconfig(8)
>> bridge filter rules can be added to ports in the bridge...
>> you can also tag traffic in bridge filter rules and then use PF to
>> filter them...
>> 
>> but if your objective is to isolate ports from each other.. this can
>> be achieved with protected port groups...
>> again check out ifconfig (8)
>> TLDR version bridge ports in the same protected port group are
>> isolated from each other...
>> If port isolation if all your looking for (no other detailed filtering
>> ) if (im not sure) veb(4) supports protected ports...then this would
>> be faster...
>> but to my shame I have not tried out veb(4)
>> 
>> I hope this is of some use...
>> 
>> 
>> 
>> 
>> 
>> 
>> On Tue, 24 Jan 2023 at 11:29, Cristian Danila  wrote:
>>> 
>>> Hello
>>> 
>>> I have a more difficult task that I would like to solve with OpenBSD
>>> and I would really
>>> appreciate any ideas if it is possible to achieve such.
>>> 
>>> I have:
>>> - one OpenBSD box with one Ethernet port
>>> - one big switch with multiple devices connected
>>> 
>>> All switch ports are isolated by each other with one exception:
>>> - All ports can communicate with only one Ethernet port(let's say port 20)
>>> 
>>> Now what i would like to achieve is to connect an Ethernet cable between
>>> OpenBSD box and port 20 of the switch, and make OpenBSD a transparent
>>> filtering hub.
>>> 
>>> So I need OpenBSD box to be a transparent bridge and filter between
>>> clients of the switch.
>>> 
>>> Can anybody suggest a point where I can think about?
>>> I was thinking initially to add the nic(em0) to veb0 then with link1
>>> achieve L3 filtering but
>>> definitely I think I miss something important.
>>> I am open to research everything is needed for it but I miss a
>>> starting point and I would
>>> really appreciate any hint.
>>> 
>>> Kind regards,
>>> Claudiu
>>> 
>> 
>> 
>> --
>> Kindest regards,
>> Tom Smyth.
>

Re: veb(4) with multiple vlan(4)'s

2023-01-22 Thread David Gwynne

> On 23 Jan 2023, at 05:42, Hrvoje Popovski  wrote:
> 
> On 22.1.2023. 12:45, David Gwynne wrote:
>>> hostname.veb1
>> description "LAN"
>> 
>>> link1
>> you don't want to enable link1 unless you want pf to filter traffic on
>> the veb ports, and then you have to be careful to avoid having pf see
>> the packet again on the vport1 interface.
>> 
> 
> ah, yes, yes thank you ...
> is because of that, that on tpmr(4) pf is enabled by default and on
> veb(4) isn't?

yes. tpmr takes the ports and their packets away from the IP stack completely, 
so there's no chance of confusion inside pf. veb assumes you're going to plug 
in with vport, so defaults to avoiding confusing pf.

Re: do i need to move to veb?

2023-01-22 Thread David Gwynne

On Sat, Jan 21, 2023 at 03:41:56PM +0300, kasak wrote:
> Hello misc!
> 
> I'm using bridge for integrating remote clients to my network with this
> simple config:
> 
> $ cat /etc/hostname.bridge0
> add vether0
> add em1
> add tap1
> up
> 
> I see in this commit that veb is supposed to replace bridge
> https://marc.info/?l=openbsd-cvs=161405102019493=2
> 
> Does it make sense to move to veb for me, or not?
> There is approximately 150 clients on the "em1" side and 10 on "tap1"

unless you're using pf to filter on em1 and tap1, then moving from
bridge and vether to veb and vport is simple. veb can be a lot faster
than bridge, so maybe that's a reason to try moving?

dlg

Re: veb(4) with multiple vlan(4)'s

2023-01-22 Thread David Gwynne

On Sun, Jan 22, 2023 at 10:25:13AM +0100, Hrvoje Popovski wrote:
> On 22.1.2023. 3:27, Scott Colby wrote:
> > Hello,
> > 
> > I am trying to set up a router with a fresh install of OpenBSD 7.2,
> > and I'm having a hard time grokking how to use veb.
> > 
> > I have organized my network into 4 subnets:
> > 
> > - DHCP "WAN"
> > - 192.168.0.0/24 "LAN"
> > - 192.168.2.0/24 "IOT"
> > - 192.168.3.0/24 "Guest"
> > 
> > My computer has 4 interfaces em{0..3} and my desired setup has the
> > following qualities:
> > - em0 is the WAN uplink with DHCP
> > - em1 is the uplink to my WAP and carries all 3 internal networks,
> >   with "LAN" untagged and "IOT" and "Guest" tagged as VLAN 1102
> >   and 1103, respectively
> > - em2 carries only "LAN", untagged
> > - em3 carries only "IOT", untagged
> > 
> > I think I should have configuration files like:
> > hostname.em0:
> > inet autoconf
> > 
> > hostname.em{1..3}:
> > up
> > 
> > hostname.veb0:
> > add em1
> > add em2
> > add em3
> > add vport0  # ??
> > add vport1  # ??
> > up
> > 
> > As for the vlan and vport interfaces, I have no idea.
> > 
> > After this, of course, I will want to do some filtering with pf
> > (such as hosts on "IOT" and "Guest" not having access to hosts on
> > "LAN.")
> > 

it sounds like you already understand using different interfaces
and subnets to separate/isolate classes of devices on different
networks. your problem is that the same class of network exists on
multiple interfaces on your router.

you could solve this problem by adding more subnets, one for each
interface rather than one per device class, and then applying the
policy to groups of interfaces. eg:

## hostname.em1:
description "LAN Wifi"
group lan
inet 192.168.10.1/24

## hostname.vlan1102
parent em1
vnetid 1102
description "IOT WiFi"
group iot
inet 192.168.12.1/24
up

## hostname.vlan1103
parent em1
vnetid 1103
description "Guest Wifi"
group guest
inet 192.168.13.1/24
up

## hostname.em2
description "LAN Ethernet"
group lan
inet 192.168.0.1/24
up
u
## hostname.em3
description "IOT Ethernet"
group iot
inet 192.168.2.1/24

then you can write rules using interface groups to apply policy instead
of ip addresses. eg, to block the guest and iot networks from talking to
the lan network:

block out quick on lan any received-on guest
block out quick on lan any received-on iot

however, you're asking about how to join the interfaces together at the
layer 2 level and keep a single layer 3 interface facing each of those
classes of network, which is what hrvoje has written config for below.

> Didn't test this but maybe something like this

yep.

the idea is that separate vebs are isolated like the traffic on
separate vlan interfaces is isolated. you create a veb per class
of device, and add the physical interfaces that face those classes
to their respective vebs. the vebs then only allow layer 2 communication
between the ports, so you add the vports to plug the IP stack on
the firewall into those networks and allow routing and pf between
them.

> hostname.em0
description WAN
> inet autoconf
> 
> hostname.em1
description "LAN Wifi"
> up
> 
> hostname.em2
description "LAN Ethernet"
> up
> 
> hostname.em3
description "IOT Ethernet"
> up
> 
> hostname.vport1
description "LAN"
> inet X.X.X.X/XX <- gateway for LAN
> 
> hostname.veb1
description "LAN"

> link1
you don't want to enable link1 unless you want pf to filter traffic on
the veb ports, and then you have to be careful to avoid having pf see
the packet again on the vport1 interface.

> add em1
> add em2
> add vport1
> up
> 
> hostname.vlan1102
> parent em1
> vnetid 1102
description "IOT WiFi"
> up
> 
> hostname.vport2
description "IOT"
> address X.X.X.X/XX <- gateway for IOT
> 
> hostname.veb2
description "IOT"
> link1

same here, don't set link1

> add vlan1102
> add em3
> add vport2
> up
> 
> hostname.vlan1103
> parent em1
> vnetid 1103
description "Guest Wifi"
> address X.X.X.X/XX <- gateway for Guest
> up
> 
> 
> if this is working than you can use pf to filter traffic between networks.
> 
> man veb
> man ifconfig and search for VEB
> 
> 
> > My questions are thus:
> > 1) What is the proper network configuration to achieve the above
> >goal?
> > 2) What is the right way to filter packets transiting between subnets
> >in this configuration? I see in the man page that the directionality
> >of packets emerging from a veb to the network stack is not normal.
> >I've seen things with adding groups to the interfaces, but not
> >sure what that gets me that using interface names in pf.conf
> >doesn't.

Unless you enable link1 on the veb interfaces, you don't have to worry
about pf and direction. Without link1, pf will only run on the vport
interfaces when traffic is routed between the different subnets.

> > 
> > 
> > Thanks in advance for any help that you can provide!
> > 
> > Scott
> > 
>

Re: bridge(4) question new network setup

2023-01-21 Thread David Gwynne




> On 22 Jan 2023, at 10:44, David Gwynne  wrote:
> 
> On Sat, Jan 21, 2023 at 01:46:34PM -0800, patrick keshishian wrote:
>> On 1/20/23, David Gwynne  wrote:
>>> On Fri, Jan 20, 2023 at 11:09:47AM -0800, patrick keshishian wrote:
>>>> Hello,
>>>> 
>>>> I am trying get a new ISP setup working.  The Router is
>>>> causing some pain.  There is a /28 public block assigned.
>>>> The DSL router can't be configured in transparent bridge
>>>> mode (they say).  It holds on to one of the /28 addresses.
>>> 
>>> i'm sure they say that, but that doesn't mean it's impossible. this
>>> will be a lot easier and more useful if you can get a dsl modem
>>> into bridge/transparent mode and do all the routing on your own
>>> box.
>> 
>> OK. So the situation was a bit worse than I had actually
>> anticipated.  After I got the described setup configured
>> I noticed that the DSL Router/Modem wouldn't send out
>> any traffic unless it had an arp entry for the source.
>> e.g., nat-to an unassigned IP from the /28 wouldn't go out.
>> 
>> Again, in my limited networking knowledge, it meant I had
>> to do proxy arp entries for /28 public IPs in the $dmz.
>> This was quite frustrating.
>> 
>> So I started poking around in the DSL Router/modem settings
>> (cuing off your "doesn't mean it's impossible") and I
>> have it now acting as a transparent bridge!
>> 
>> I spent most of Tues on the phone with their techs, and I
>> was assured that is not possible/unsupported.  Now maybe
>> they actually meant "unsupported" mode as far as their
>> support is concerned.
>> 
>> But things seem to running as expect (so far)!  So thanks
>> for the bit of "encouragement"!
> 
> Does that mean you have the WAN IP on your router now? And you can do
> whatever you want with the /28?
> 
>>> that would also give you the option to do fun stuff like NOT putting
>>> the /28 onto an ethernet network so you could you use all 16 of the
>>> IPs on dmz hosts instead of losing some to network/broadcast/gateway.
>> 
>> I am curious how you would go about doing what you suggest:
>> Using all 16 of /28.
> 
> The simple (and currently best supported) way is to set up a tunnel
> interface for every IP in the /28 and connect the tunnel to the server
> providing the service. The router would have a config like this:
> 
> ifconfig gif0 create
> ifconfig gif0 tunnel $router_lan_ip $server_lan_ip
> ifconfig gif0 inet $router_gif_ip $server_slash28_ip

you can also just rdr connections to the /28 IPs to things, they don’t have to 
be real IPs assigned to hosts anywhere.


> 
>> 
>> Thanks for your reply,
>> --patrick
>> 
>> 
>>>> The setup looks something like this:
>>>> (and hopefully the ascii "art" remains intact from gmail)
>>>> 
>>>>   ( internet )
>>>>|
>>>>| [WAN IP]
>>>>  +-o--+
>>>> / DSL ROUTER / <-- Transparent bridge mode NOT possible
>>>> +-o--+
>>>>  | [ one of /28 Public IPs = $dslgw_ip ]
>>>>  |
>>>>  |
>>>>  | $ext
>>>> +-o--+
>>>> ||
>>>> | OpenBSD/pf o--- ( rest of /28 Public IP network )
>>>> || $dmz  (DMZ: httpd, smtpd, ...)
>>>> +-o--+
>>>> $lan | [10.x.x.1]
>>>>  |
>>>> ( 10.x.x.x network )
>>>> 
>>>> 
>>>> As far as networking goes, I need to be spoken to as if I'm
>>>> a fledgling.
>>>> 
>>>> I want to do the obvious: use OpenBSD/pf(4) to:
>>>> - Filter traffic from $ext to $dmz
>>>> - Filter traffic from $dmz outbound
>>>> - Filter traffic from $lan (10.x.x.x) to $dmz
>>>> - NAT traffic from $lan (10.x.x.x) outbound to internet
>>>> 
>>>> 
>>>> I'm bridge(4)-ing $ext and $dmz.  Which means I must give
>>>> one of the /28 public IP addresses to either $ext or $dmz
>>>> to be able to do:
>>>> 
>>>> # route add default $dslgw_ip
>>>> 
>>>> (!?)
>>>> 
>>>> Am I missing something?
>>>> Is there a better way to configure things?
>>>> 
>>>> Thanks,
>>>> --patrick

Re: bridge(4) question new network setup

2023-01-21 Thread David Gwynne

On Sat, Jan 21, 2023 at 01:46:34PM -0800, patrick keshishian wrote:
> On 1/20/23, David Gwynne  wrote:
> > On Fri, Jan 20, 2023 at 11:09:47AM -0800, patrick keshishian wrote:
> >> Hello,
> >>
> >> I am trying get a new ISP setup working.  The Router is
> >> causing some pain.  There is a /28 public block assigned.
> >> The DSL router can't be configured in transparent bridge
> >> mode (they say).  It holds on to one of the /28 addresses.
> >
> > i'm sure they say that, but that doesn't mean it's impossible. this
> > will be a lot easier and more useful if you can get a dsl modem
> > into bridge/transparent mode and do all the routing on your own
> > box.
> 
> OK. So the situation was a bit worse than I had actually
> anticipated.  After I got the described setup configured
> I noticed that the DSL Router/Modem wouldn't send out
> any traffic unless it had an arp entry for the source.
> e.g., nat-to an unassigned IP from the /28 wouldn't go out.
> 
> Again, in my limited networking knowledge, it meant I had
> to do proxy arp entries for /28 public IPs in the $dmz.
> This was quite frustrating.
> 
> So I started poking around in the DSL Router/modem settings
> (cuing off your "doesn't mean it's impossible") and I
> have it now acting as a transparent bridge!
> 
> I spent most of Tues on the phone with their techs, and I
> was assured that is not possible/unsupported.  Now maybe
> they actually meant "unsupported" mode as far as their
> support is concerned.
> 
> But things seem to running as expect (so far)!  So thanks
> for the bit of "encouragement"!

Does that mean you have the WAN IP on your router now? And you can do
whatever you want with the /28?

> > that would also give you the option to do fun stuff like NOT putting
> > the /28 onto an ethernet network so you could you use all 16 of the
> > IPs on dmz hosts instead of losing some to network/broadcast/gateway.
> 
> I am curious how you would go about doing what you suggest:
> Using all 16 of /28.

The simple (and currently best supported) way is to set up a tunnel
interface for every IP in the /28 and connect the tunnel to the server
providing the service. The router would have a config like this:

ifconfig gif0 create
ifconfig gif0 tunnel $router_lan_ip $server_lan_ip
ifconfig gif0 inet $router_gif_ip $server_slash28_ip

> 
> Thanks for your reply,
> --patrick
> 
> 
> >> The setup looks something like this:
> >> (and hopefully the ascii "art" remains intact from gmail)
> >>
> >>( internet )
> >> |
> >> | [WAN IP]
> >>   +-o--+
> >>  / DSL ROUTER / <-- Transparent bridge mode NOT possible
> >> +-o--+
> >>   | [ one of /28 Public IPs = $dslgw_ip ]
> >>   |
> >>   |
> >>   | $ext
> >> +-o--+
> >> ||
> >> | OpenBSD/pf o--- ( rest of /28 Public IP network )
> >> || $dmz  (DMZ: httpd, smtpd, ...)
> >> +-o--+
> >>  $lan | [10.x.x.1]
> >>   |
> >> ( 10.x.x.x network )
> >>
> >>
> >> As far as networking goes, I need to be spoken to as if I'm
> >> a fledgling.
> >>
> >> I want to do the obvious: use OpenBSD/pf(4) to:
> >>  - Filter traffic from $ext to $dmz
> >>  - Filter traffic from $dmz outbound
> >>  - Filter traffic from $lan (10.x.x.x) to $dmz
> >>  - NAT traffic from $lan (10.x.x.x) outbound to internet
> >>
> >>
> >> I'm bridge(4)-ing $ext and $dmz.  Which means I must give
> >> one of the /28 public IP addresses to either $ext or $dmz
> >> to be able to do:
> >>
> >> # route add default $dslgw_ip
> >>
> >> (!?)
> >>
> >> Am I missing something?
> >> Is there a better way to configure things?
> >>
> >> Thanks,
> >> --patrick
> >>
> >

Re: bridge(4) question new network setup

2023-01-21 Thread David Gwynne

On Sat, Jan 21, 2023 at 01:32:18PM -0800, patrick keshishian wrote:
> On 1/20/23, Hrvoje Popovski  wrote:
> > On 20.1.2023. 20:09, patrick keshishian wrote:
> >> Hello,
> >>
> >> I am trying get a new ISP setup working.  The Router is
> >> causing some pain.  There is a /28 public block assigned.
> >> The DSL router can't be configured in transparent bridge
> >> mode (they say).  It holds on to one of the /28 addresses.
> >>
> >> The setup looks something like this:
> >> (and hopefully the ascii "art" remains intact from gmail)
> >>
> >>( internet )
> >> |
> >> | [WAN IP]
> >>   +-o--+
> >>  / DSL ROUTER / <-- Transparent bridge mode NOT possible
> >> +-o--+
> >>   | [ one of /28 Public IPs = $dslgw_ip ]
> >>   |
> >>   |
> >>   | $ext
> >> +-o--+
> >> ||
> >> | OpenBSD/pf o--- ( rest of /28 Public IP network )
> >> || $dmz  (DMZ: httpd, smtpd, ...)
> >> +-o--+
> >>  $lan | [10.x.x.1]
> >>   |
> >> ( 10.x.x.x network )
> >>
> >>
> >> As far as networking goes, I need to be spoken to as if I'm
> >> a fledgling.
> >>
> >> I want to do the obvious: use OpenBSD/pf(4) to:
> >>  - Filter traffic from $ext to $dmz
> >>  - Filter traffic from $dmz outbound
> >>  - Filter traffic from $lan (10.x.x.x) to $dmz
> >>  - NAT traffic from $lan (10.x.x.x) outbound to internet
> >>
> >>
> >> I'm bridge(4)-ing $ext and $dmz.  Which means I must give
> >> one of the /28 public IP addresses to either $ext or $dmz
> >> to be able to do:
> >>
> >> # route add default $dslgw_ip
> >>
> >> (!?)
> >>
> >> Am I missing something?
> >> Is there a better way to configure things?
> >>
> >> Thanks,
> >> --patrick
> >>
> >
> > Hi,
> >
> > If your ext interface is in same subnet as that /28 from your ISP then
> > you could:
> >
> > - use veb(4) to bridge ext, dmz and vport(4) interface and add default
> > route to dslgw_ip. vport is ip interface for veb
> 
> I started out looking at veb(4) but I wasn't confident
> how I could filter traffic in/out of $dmz.  Also, the
> description of vport(4) which states "packets traversing
> vport interfaces appear to travel in the opposite direction
> to packets travelling over other ports" confused me even
> more.  So I started using bridge(4).

When you add a port to veb(4), it takes it over completely and by
default it only uses it to switch traffic at layer 2 (Ethernet).
In other words, by default veb(4) does not run pf against packets
on ports.

vport is an exception because it operates as if it is a normal
ethernet interface plugged into a switchport, it's just that the
switch in this situation is veb, and the other ports on that switch
are the non-vport interfaces you added to the veb.

So, by default veb lets you build a switch out of other interfaces
in the system, and vport lets you plug the kernel network stack
into that virtual switch. Because packets from a normal switch coming
into a normal physical interface go in to the network stack, that is
also how it behaves with vport. ie, you write rules in pf like this for
packets coming from a veb into a vport:

  pass in on vport0 inet tcp from any to port ssh

If you do enable IP filtering on veb (ie, you ifconfig veb0 link1 as per
the ifconfig manpage), then packets coming from the "wire" into the
interface are filtered by pf too. This means that if a packet is coming
from the wire and is destined to your network stack via a vport
interface, it will be going through pf twice: once when it comes into
the physical interface and again when it goes over vport.

pf is not designed for a packet to be processed twice. TCP packets in
particular going through pf twice will confuse the window tracking. If
you're doing NAT or something like that, it can also get confused.

So if you're going to enable link1 on veb(4), you need to either skip pf
on the vport interface, or put the veb and vport into different rdomains
so pf will keep separate the states for them.

It is doable and supported, you just need to be mindful of this
semantic.

I found running pf on bridge(4) to be a nightmare, cos every interface
you add as a port on bridge kind of acts as two ports, one that goes to
the wire and another that goes to the stack, but it's hard to say which
will happen and what the right way to filter it is. veb(4) taking over
interfaces completely and not running pf by default is in large part
because of this pain I had with bridge.

> > - or on ext interface put ip alias with ip addresses from /28 public
> > range and than do binat-to or nat-to in pf to hosts in dmz
> >
> > or maybe i totally misunderstood you  :)
> 
> I think you understood me fine. I'm just not too familiar
> with how networking actually works.

Then on top of the networking theory there's the quirks of how
different systems implement things...

Re: bridge(4) question new network setup

2023-01-20 Thread David Gwynne

On Fri, Jan 20, 2023 at 11:09:47AM -0800, patrick keshishian wrote:
> Hello,
> 
> I am trying get a new ISP setup working.  The Router is
> causing some pain.  There is a /28 public block assigned.
> The DSL router can't be configured in transparent bridge
> mode (they say).  It holds on to one of the /28 addresses.

i'm sure they say that, but that doesn't mean it's impossible. this
will be a lot easier and more useful if you can get a dsl modem
into bridge/transparent mode and do all the routing on your own
box.

that would also give you the option to do fun stuff like NOT putting
the /28 onto an ethernet network so you could you use all 16 of the
IPs on dmz hosts instead of losing some to network/broadcast/gateway.

> The setup looks something like this:
> (and hopefully the ascii "art" remains intact from gmail)
> 
>( internet )
> |
> | [WAN IP]
>   +-o--+
>  / DSL ROUTER / <-- Transparent bridge mode NOT possible
> +-o--+
>   | [ one of /28 Public IPs = $dslgw_ip ]
>   |
>   |
>   | $ext
> +-o--+
> ||
> | OpenBSD/pf o--- ( rest of /28 Public IP network )
> || $dmz  (DMZ: httpd, smtpd, ...)
> +-o--+
>  $lan | [10.x.x.1]
>   |
> ( 10.x.x.x network )
> 
> 
> As far as networking goes, I need to be spoken to as if I'm
> a fledgling.
> 
> I want to do the obvious: use OpenBSD/pf(4) to:
>  - Filter traffic from $ext to $dmz
>  - Filter traffic from $dmz outbound
>  - Filter traffic from $lan (10.x.x.x) to $dmz
>  - NAT traffic from $lan (10.x.x.x) outbound to internet
> 
> 
> I'm bridge(4)-ing $ext and $dmz.  Which means I must give
> one of the /28 public IP addresses to either $ext or $dmz
> to be able to do:
> 
> # route add default $dslgw_ip
> 
> (!?)
> 
> Am I missing something?
> Is there a better way to configure things?
> 
> Thanks,
> --patrick
>

Re: Stretch/L2VPN between two datacenters

2023-01-19 Thread David Gwynne

gt; 
> > Thanks for your replies. It has been Xmas and I have been delayed, but I
> > have now read up upon it. I am going for the tpmr(4). We are going to
> > replicate a lot of live data from Site1 to Site2, and my experiences with
> > OpenVPN is that it is great, but not high performing. So I have established
> > a WireGuard connection with one OBSD on each site, and I am planning to
> > tunnel tpmr through this - I guess that tpmr itself is not encrypted in any
> > way?
> >
> > Regards, Lars.
> >
> > On Fri, Dec 16, 2022 at 4:30 PM deich...@placebonol.com <
> > deich...@placebonol.com> wrote:
> >
> >> I've run L2 over an IPsec tunnel using egre (gre(4)) and bridge (bridge
> >> (4)) to connect systems in different locations together.
> >>
> >> This was done before David Gwynne created tpmr(4).  I've been to lazy to
> >> reimplement my current configuration.
> >>
> >> 73
> >> diana
> >>
> >

Re: DHCP server ignoring PF rules?

2022-12-17 Thread David Gwynne

dhcpd reads packets off the wire using BPF, which happens as packets come off 
the network interface, but before the IP stack where pf runs.

> On 17 Dec 2022, at 22:40, Cristian Danila  wrote:
> 
> Good day!
> I finished setup an DHCP server and for some reason it seems DHCP
> server is ignoring PF filter.
> In short, in PF I have active only one rule:
> block drop quick all
> 
> Double checked PF and it is enabled
> So using a windows machine to test DHCP server:
> 1) ifconfig /release
> 2) ifconfig /renew
> 
> somehow dhcpd still serves the windows(only when is enabled) and
> ignores PF rule.
> Could you please help me in telling if dhcpd has some intended logic
> to ignore PF or what might
> cause this unexpected behavior?
> 
> Kind Regards!
>

Re: Stretch/L2VPN between two datacenters

2022-12-16 Thread David Gwynne

On Fri, Dec 16, 2022 at 11:39:02AM +0100, Hrvoje Popovski wrote:
> On 16.12.2022. 11:33, Lars Bonnesen wrote:
> > We are about to migrate VM's from one datacenter to another and the VMware
> > L2VPN we are using for this is simply not stable for some reason that we
> > cannot figure out why.
> > 
> > I have used GRE-tunneling before on a software router that I actually
> > cannot remember the name of now, but if OpenBSD can do the same, I would
> > rather deploy one OpenBSD on each site and have that task handled by
> > OpenBSD.
> > 
> > Each site should be able to use the other site gateway over a
> > L2-network.and VMs on each site should be able to see each other as they
> > are on the same LAN
> > 
> > Where to start reading?
> 
> 
> man tpmr

yes. i wrote tpmr for this exact situation. i wanted to connect
switches in different datacentres together over tunnels (etherip
in my case) while i was migrating from one site to the other.

i was considering calling the driver xconnect or xcon, but went with
tpmr because i was reading the ethernet bridge specification at the time
and it talks about a special type of bridge called a two port mac relay.

Re: Setting up vmd with veb0/vport0

2022-05-12 Thread David Gwynne

It looks like vport0 is down. Add "up" to hostname.vport0 and ifconfig
vport0 up.

On Thu, 12 May 2022 at 15:40, David Demelier  wrote:

> Hello,
>
> I'm trying to setup vms using the wonderful vmd and private addresses
> on 10.0.0.0 range. Following the various entries in the FAQ (faq16) and
> the examples using bridge/vether I just wanted to adapt to using
> veb/vport instead since it's designed as a newer and more performant
> replacement.
>
> I've also seen someone who managed to get it working
>
>
> https://misc.openbsd.narkive.com/nAdmGfbQ/i-can-t-get-veb-vport-to-work-with-vmd
>
> So first, I setup the interfaces:
>
> # cat /etc/hostname.veb0
> add vport0
> up
> # cat /etc/hostname.vport0
> inet 10.0.0.1 255.255.255.0
>
> I enable NAT as specified in the FAQ and numerous examples.
>
> # cat /etc/pf.conf
> set skip on lo0
>
> match in all scrub (no-df random-id max-mss 1440)
> match out on egress inet from vport0:network to any nat-to (egress)
>
> block log
> pass out quick inet
> pass in on vport0 inet
>
> Then, setting up vmd to boot an install71.iso with the appropriate tap
> interfaces:
>
> # cat /etc/vm.conf
> switch "switch0" {
> interface veb0
> }
>
> vm "vm1" {
> disk "/vm/vm1.qcow2"
> boot device cdrom
> cdrom "/vm/install71.iso"
>
> interface tap {
> switch "switch0"
> }
> }
>
> Finally, once the install is boot, I've tried adding 10.0.0.10 netmask
> 255.255.255.0 and 10.0.0.1 as gateway with no luck. The nameserver is
> copied from /etc/resolv.conf but I can't get any packet to the
> internet.
>
> (vm) #
> ping 8.8.8.8
> PING 8.8.8.8 (8.8.8.8): 56 data bytes
> ping: sendmsg: Can't assign requested address
> ping: wrote 8.8.8.8 64 chars, ret=-1
> (vm) #
> # ftp http://5.135.187.121/index.html
> Trying 5.135.187.121...
> ftp: connect: Can't assign requested address
>
> I'm sure I miss almost nothing but I can't find what.
>
> Here's the host full ifconfig
>
> lo0: flags=8049 mtu 32768
> index 4 priority 0 llprio 3
> groups: lo
> inet6 ::1 prefixlen 128
> inet6 fe80::1%lo0 prefixlen 64 scopeid 0x4
> inet 127.0.0.1 netmask 0xff00
> iwx0: flags=808843
> mtu 1500
> lladdr e0:d4:64:3c:31:9c
> index 1 priority 4 llprio 3
> groups: wlan egress
> media: IEEE802.11 autoselect (VHT-MCS9 mode 11ac)
> status: active
> ieee80211: join "abc" chan 149 bssid aa:37:d8:93:98:57 82%
> wpakey wpaprotos wpa2 wpaakms psk wpaciphers ccmp wpagroupcipher ccmp
> inet 172.20.10.3 netmask 0xfff0 broadcast 172.20.10.15
> em0: flags=808843 mtu
> 1500
> lladdr 8c:8c:aa:01:7d:1f
> index 2 priority 0 llprio 3
> media: Ethernet autoselect (none)
> status: no carrier
> enc0: flags=0<>
> index 3 priority 0 llprio 3
> groups: enc
> status: active
> veb0: flags=8843
> description: switch1-switch0
> index 5 llprio 3
> groups: veb
> vport0 flags=3
> port 7 ifpriority 0 ifcost 0
> tap0 flags=3
> port 8 ifpriority 0 ifcost 0
> vlan0: flags=8002 mtu 1500
> lladdr e0:d4:64:3c:31:9c
> index 6 priority 0 llprio 3
> encap: vnetid none parent iwx0 txprio packet rxprio outer
> groups: vlan
> media: IEEE802.11 autoselect (VHT-MCS9 mode 11ac)
> status: active
> vport0: flags=8902 mtu 1500
> lladdr fe:e1:ba:d0:32:b5
> index 7 priority 0 llprio 3
> groups: vport
> inet 10.0.0.1 netmask 0xff00 broadcast 10.0.0.255
> tap0: flags=8943 mtu
> 1500
> lladdr fe:e1:ba:d1:f2:03
> description: vm1-if0-vm1
> index 8 priority 0 llprio 3
> groups: tap
> status: active
>
> Any help is appreciated.
>
> Regards,
>
> --
> David
>
>

Re: vxlan(4) in endpoint mode

2022-04-04 Thread David Gwynne




> On 3 Apr 2022, at 21:46, Denis Fondras  wrote:
> 
> Hi,
> 
> In vxlan(4) manual, we have :
> 
> endpoint mode
>   When configured without a tunnel destination address, vxlan operates as
>   a bridge, but with learning disabled.
> 
> 
> The question is : is it possible to set tunnel source address without a
> destination ?

ifconfig vxlan0 tunneladdr 10.20.30.1

that should work.

Re: Changing rdomain on an interface after the rdomain has already been set openbsd7.0 / 7.1snapshots

2022-04-01 Thread David Gwynne

loopback interfaces are special and kind of end up representing an rdomain 
inside the kernel, which is where this restriction comes from.

dlg

> On 2 Apr 2022, at 09:36, Tom Smyth  wrote:
> 
> Hello,
>  I came across an issue that once a rdomain is set on a
> loopback interface
> you cant change it without destroying and re-creating the interace,
> while it appears you can change a virtio  network interface, is this a
> bug or a feature
> 
> tobsd# ifconfig lo3 create
> tobsd# ifconfig lo3 rdomain 3
> tobsd# ifconfig lo3 inet 127.0.0.1/8
> tobsd# ifconfig lo3
> lo3: flags=8049 rdomain 3 mtu 32768
> index 5 priority 0 llprio 3
> groups: lo
> inet6 ::1 prefixlen 128
> inet6 fe80::1%lo3 prefixlen 64 scopeid 0x5
> inet 127.0.0.1 netmask 0xff00
> tobsd# ifconfig lo3 rdomain 0
> ifconfig: SIOCSIFRDOMAIN: Operation not permitted
> tobsd# ifconfig
> 
> 
> -- 
> Kindest regards,
> Tom Smyth.
>

Re: tcpdump - ifname in filter expression

2022-03-27 Thread David Gwynne

On Wed, Mar 23, 2022 at 02:34:54PM -0400, Aner Perez wrote:
> On 3/22/22 00:37, David Gwynne wrote:
> > On Mon, Mar 21, 2022 at 04:37:59PM -0400, Aner Perez wrote:
> > > I noticed that if I put an "ifname" (or "on") in a fllter expression for
> > > tcpdump, it will show all traffic that has an ifname that *starts with* 
> > > the
> > > name I provided.?? e.g.
> > > 
> > > # tcpdump -n -l -e -ttt -i pflog0 ifname vlan1
> > > 
> > > Will show packets for vlan1 but also for vlan110, vlan140, etc (but not 
> > > for em0).
> > > 
> > > It's not clear from the man page if that is the intended behavior.
> > > 
> > > https://man.openbsd.org/tcpdump.8#ifname
> > > 
> > > |ifname| <https://man.openbsd.org/tcpdump.8#ifname> interface
> > > True if the packet was logged as coming from the specified interface 
> > > (applies only to
> > > packets logged by pf(4) <https://man.openbsd.org/pf.4>).
> > > 
> > > While testing I also tried using "ifname vlan" as the filter but it fails
> > > with a syntax error.?? I'm thinking that is probably an unintended
> > > interaction with the "vlan" primitive since "ifname em" or "ifname bnx" 
> > > seem
> > > to work with no error.
> > > 
> > > This is all tested on 6.7 so apologies if this is not the current 
> > > behavior.
> > i think this behaviour with ifname is unintended. the diff below tries
> > to fix it by having the ifname comparison include the terminating nul
> > when doing a comparison of the supplied interface name and the one in
> > the pflog header.
> > 
> > the consequence is that it will not longer do string prefix matches,
> > only whole name matches.
> > 
> > the vlan thing is different because there's a "vlan" keyword in our
> > pcap filter language that lets you do things like "tcpdump vlan
> > 123" when sniffing on a vlan parent interface to limit the packets
> > to those with tag 123. the parser is saying it didnt expect you to
> > talk about vlan when it's supposed to be a string (ie, not a keyword)
> > at that point.
> > 
> > Index: gencode.c
> > ===
> > RCS file: /cvs/src/lib/libpcap/gencode.c,v
> > retrieving revision 1.60
> > diff -u -p -r1.60 gencode.c
> > --- gencode.c   13 Feb 2022 20:02:30 -  1.60
> > +++ gencode.c   22 Mar 2022 04:29:40 -
> > @@ -3230,7 +3246,7 @@ gen_pf_ifname(char *ifname)
> > len - 1);
> > /* NOTREACHED */
> > }
> > -   b0 = gen_bcmp(off, strlen(ifname), ifname);
> > +   b0 = gen_bcmp(off, strlen(ifname) + 1, ifname);
> > return (b0);
> >   }
> > 
> That certainly seems like it would do the trick.?? Would your diff make it
> into the official source tree for a future release or is this something that
> needs to be discussed by the powers that be?

i thought i was the relevant power :'(

deraadt@ said ok too, so i've put it in. should be in snaps soon and the
next release.

Re: tcpdump - ifname in filter expression

2022-03-21 Thread David Gwynne

On Mon, Mar 21, 2022 at 04:37:59PM -0400, Aner Perez wrote:
> I noticed that if I put an "ifname" (or "on") in a fllter expression for
> tcpdump, it will show all traffic that has an ifname that *starts with* the
> name I provided.?? e.g.
> 
> # tcpdump -n -l -e -ttt -i pflog0 ifname vlan1
> 
> Will show packets for vlan1 but also for vlan110, vlan140, etc (but not for 
> em0).
> 
> It's not clear from the man page if that is the intended behavior.
> 
> https://man.openbsd.org/tcpdump.8#ifname
> 
> |ifname|  interface
>True if the packet was logged as coming from the specified interface 
> (applies only to
>packets logged by pf(4) ).
> 
> While testing I also tried using "ifname vlan" as the filter but it fails
> with a syntax error.?? I'm thinking that is probably an unintended
> interaction with the "vlan" primitive since "ifname em" or "ifname bnx" seem
> to work with no error.
> 
> This is all tested on 6.7 so apologies if this is not the current behavior.

i think this behaviour with ifname is unintended. the diff below tries
to fix it by having the ifname comparison include the terminating nul
when doing a comparison of the supplied interface name and the one in
the pflog header.

the consequence is that it will not longer do string prefix matches,
only whole name matches.

the vlan thing is different because there's a "vlan" keyword in our
pcap filter language that lets you do things like "tcpdump vlan
123" when sniffing on a vlan parent interface to limit the packets
to those with tag 123. the parser is saying it didnt expect you to
talk about vlan when it's supposed to be a string (ie, not a keyword)
at that point.

Index: gencode.c
===
RCS file: /cvs/src/lib/libpcap/gencode.c,v
retrieving revision 1.60
diff -u -p -r1.60 gencode.c
--- gencode.c   13 Feb 2022 20:02:30 -  1.60
+++ gencode.c   22 Mar 2022 04:29:40 -
@@ -3230,7 +3246,7 @@ gen_pf_ifname(char *ifname)
len - 1);
/* NOTREACHED */
}
-   b0 = gen_bcmp(off, strlen(ifname), ifname);
+   b0 = gen_bcmp(off, strlen(ifname) + 1, ifname);
return (b0);
 }

Re: PF bi-nat

2022-02-24 Thread David Gwynne

On Wed, Feb 23, 2022 at 04:55:05PM +, Laura Smith wrote:
> I've never had occasion to use bi-nat before and I'm struggling a little to 
> wrap my head around the concept.
> 
> The OpenBSD FAQ (https://www.openbsd.org/faq/pf/nat.html) gives the following 
> example:
> 
> "pass on tl0 from $web_serv_int to any binat-to $web_serv_ext"
> 
> However I'm not clear on how this fits into the overall filtering strategy ?  
> i.e. building logically on the example above, how do I say "only allow 
> inbound bi-nat for ports 80 & 443".
> 
> The FAQ makes an obscure statement "TCP and UDP ports are never modified with 
> binat-to rules as they are with nat rules.", which I'm guessing is where the 
> answer lies.  But I'm not clear what this means in context ?
> 
> Thanks !

turns out binat is syntactic sugar, so it can be understood in terms of
nat and rdr rules. let's say 192.0.2.1 is your external ip, 10.0.0.1
is your internal ip, and em0 is your external interface:

dlg@ix ~$ echo 'pass on em0 inet from 10.0.0.1 to any binat-to 192.0.2.1' | 
pfctl -vnf - 
pass out on em0 inet from 10.0.0.1 to any flags S/SA nat-to 192.0.2.1 
static-port
pass in on em0 inet from any to 192.0.2.1 flags S/SA rdr-to 10.0.0.1

i read that as any connection to my external ip is forwarded to the
backend, and any connection from my backend server is rewritten to
appear as if it's coming from my external ip. this could be useful if
you've got a small public ip address allocation (eg a /29) from an
ISP and don't want to burn the network and broadcast addresses by
putting them on an actual subnet. you would binat every public IP
to a backend on a private IP instead. id personally use p2p tunnels
from the router to each backend, but maybe MTU/MRU is precious too?

anyway, in terms of policy, restricting this to ports 80 and 443
looks a bit clumsy:

dlg@ix ~$ echo 'pass on em0 inet from 10.0.0.1 to any port { 80 443 } binat-to 
192.0.2.1' | pfctl -vnf -
stdin:1: port only applies to tcp/udp
stdin:1: skipping rule due to errors
stdin:1: port only applies to tcp/udp
stdin:1: skipping rule due to errors
stdin:1: port only applies to tcp/udp
stdin:1: skipping rule due to errors
stdin:1: rule expands to no valid combination
stdin:1: port only applies to tcp/udp
stdin:1: skipping rule due to errors
stdin:1: port only applies to tcp/udp
stdin:1: skipping rule due to errors
stdin:1: rule expands to no valid combination
stdin:1: rule expands to no valid combination
dlg@ix ~$ echo 'pass on em0 inet proto tcp from 10.0.0.1 to any port { 80 443 } 
binat-to 192.0.2.1' | pfctl -vnf -
pass out on em0 inet proto tcp from 10.0.0.1 to any port = 80 flags S/SA nat-to 
192.0.2.1 static-port
pass in on em0 inet proto tcp from any port = 80 to 192.0.2.1 flags S/SA rdr-to 
10.0.0.1
pass in on em0 inet proto tcp from any port = 443 to 192.0.2.1 flags S/SA 
rdr-to 10.0.0.1
pass out on em0 inet proto tcp from 10.0.0.1 to any port = 443 flags S/SA 
nat-to 192.0.2.1 static-port
pass in on em0 inet proto tcp from any port = 443 to 192.0.2.1 flags S/SA 
rdr-to 10.0.0.1

yeah. im not sure the pass out rules are that useful in practice.

if you had a default allow policy, then binat could make sense. you'd
have a pass binat rule followed by block rules to filter out the
exceptions to your default policy.

another option could be using match and tags:

dlg@ix ~$ cat /tmp/rules
match on em0 inet from 10.0.0.1 to any binat-to 192.0.2.1 tag backend
pass out on em0 tagged backend
pass in on em0 inet proto tcp to port { 80 443 } tagged backend
dlg@ix ~$ pfctl -vnf /tmp/rules
match out on em0 inet from 10.0.0.1 to any tag backend nat-to 192.0.2.1 
static-port
match in on em0 inet from any to 192.0.2.1 tag backend rdr-to 10.0.0.1
pass in on em0 inet proto tcp from any to any port = 80 flags S/SA tagged 
backend
pass in on em0 inet proto tcp from any to any port = 443 flags S/SA tagged 
backend
pass out on em0 all flags S/SA tagged backend

dlg

Re: Capturing redirected packets?

2022-02-10 Thread David Gwynne

> On 10 Feb 2022, at 18:55, Stuart Henderson  wrote:
> 
> Normally if you have two addresses on the same lan you'd configure them
> as aliases on the one interface, this seems a bit of a non-standard
> config.

If aggr/trunk to increase bandwidth makes sense, then you can think of 
configuring multiple IPs on different interfaces on the same network as the 
same kind of thing but at layer 3 instead of at layer 2. It's just ECMP. If you 
have the net.inet.ip.multipath sysctl set to 1 it should work, but I think we 
need an rtable diff I've had in my back pocket for a few years for dealing with 
the ARP side of things to actually make it work.

dlg

Re: GRE IP6/IP6 not working as soon as pf is enabled

2022-01-16 Thread David Gwynne

you've set the net.inet.gre.allow sysctl to 1, right?

> On 16 Jan 2022, at 17:05, Markus Wipp  wrote:
> 
> Hi David,
> 
> First of all thank you so much taking the time for my question!
> 
>> My first impression is that you're confusing where to apply policy to
>> the encapsulated traffic. "pass on gre proto gre" implies you're
>> trying to pass GRE packets as they go over gre(4) interfaces, but
>> it's the unencapsulated packets that go over gre(4), and the GRE
>> encapsulated packets will go over your "underlay" or physical
>> interfaces, which looks like em0 according to tcpdump.
> 
> Yes, it might be that I’m a little bit confused right now, after all the
> “Experiments” I already did to make this work.
> 
>> Your pass rule should let everything work though. Those two rules are
>> your entire ruleset?
> 
> Yes, those two rules are all I have (I reduced my whole rule set to this to 
> sort out things)
> In the meantime I changed it to the following as per your and Georgs 
> suggestion.
> 
> In file:
> pass log (all, to pflog0)
> # pass the GRE encapsulated traffic
> pass inet6 proto gre
> # let ping6 over gre(4) work
> pass on gre inet6 proto icmp6
> #pass on gre proto gre no state
> 
> 
> doas pfctl -s rules
> pass log (all) all flags S/SA
> pass inet6 proto gre all
> pass on gre inet6 proto ipv6-icmp all
> 
> With these rules I get, so at least I can see the reply on em0:
> 
> doas tcpdump -nvei em0 ip6 or icmp6 or proto gre
> tcpdump: listening on em0, link-type EN10MB
> 07:54:28.107820 00:0d:b9:44:ec:dc 34:81:c4:e0:4b:79 86dd 162: 
> 2a02::yyy:zzz::1 > 2a00:::::10: gre [] 86dd 
> 2a01:qqq::ss::2 > 2a01:qqq::ss::1: icmp6: echo request (id:597c 
> seq:0) (len 64, hlim 64) [flowlabel 0x71e6] (len 108, hlim 64)
> 07:54:28.156366 34:81:c4:e0:4b:79 00:0d:b9:44:ec:dc 86dd 170: 
> 2a00:::::10 > 2a02::yyy:zzz::1: DSTOPT (type 0x04: len=1) gre 
> [] 86dd 2a01:qqq::ss::1 > 2a01:qqq::ss::2: icmp6: echo reply (id:597c 
> seq:0) [flowlabel 0xa8f7b] (len 64, hlim 64) [flowlabel 0xa8f7b] (len 116, 
> hlim 243)
> 07:54:29.109744 00:0d:b9:44:ec:dc 34:81:c4:e0:4b:79 86dd 162: 
> 2a02::yyy:zzz::1 > 2a00:::::10: gre [] 86dd 
> 2a01:qqq::ss::2 > 2a01:qqq::ss::1: icmp6: echo request (id:597c 
> seq:1) (len 64, hlim 64) [flowlabel 0x71e6] (len 108, hlim 64)
> 07:54:29.166480 34:81:c4:e0:4b:79 00:0d:b9:44:ec:dc 86dd 170: 
> 2a00:::::10 > 2a02::yyy:zzz::1: DSTOPT (type 0x04: len=1) gre 
> [] 86dd 2a01:qqq::ss::1 > 2a01:qqq::ss::2: icmp6: echo reply (id:597c 
> seq:1) [flowlabel 0xa8f7b] (len 64, hlim 64) [flowlabel 0xa8f7b] (len 116, 
> hlim 243)
> 07:54:30.110067 00:0d:b9:44:ec:dc 34:81:c4:e0:4b:79 86dd 162: 
> 2a02::yyy:zzz::1 > 2a00:::::10: gre [] 86dd 
> 2a01:qqq::ss::2 > 2a01:qqq::ss::1: icmp6: echo request (id:597c 
> seq:2) (len 64, hlim 64) [flowlabel 0x71e6] (len 108, hlim 64)
> 07:54:30.156013 34:81:c4:e0:4b:79 00:0d:b9:44:ec:dc 86dd 170: 
> 2a00:::::10 > 2a02::yyy:zzz::1: DSTOPT (type 0x04: len=1) gre 
> [] 86dd 2a01:qqq::ss::1 > 2a01:qqq::ss::2: icmp6: echo reply (id:597c 
> seq:2) [flowlabel 0xa8f7b] (len 64, hlim 64) [flowlabel 0xa8f7b] (len 116, 
> hlim 243)
> 
> Unfortunately it never reaches gre0:
> 
> doas tcpdump -nvei gre1051 ip6 or icmp6 or proto gre
> tcpdump: listening on gre1051, link-type LOOP
> 07:54:28.107741 2a01:qqq::ss::2 > 2a01:qqq::ss::1: icmp6: echo 
> request (id:597c seq:0) [icmp6 cksum ok] (len 64, hlim 64)
> 07:54:29.109675 2a01:qqq::ss::2 > 2a01:qqq::ss::1: icmp6: echo 
> request (id:597c seq:1) [icmp6 cksum ok] (len 64, hlim 64)
> 07:54:30.110004 2a01:qqq::ss::2 > 2a01:qqq::ss::1: icmp6: echo 
> request (id:597c seq:2) [icmp6 cksum ok] (len 64, hlim 64)
> 
> 
>> The bare "pass" rule not letting this work makes me feel like there's
>> more to this though.
> 
> Yes, I also think that there must be more to it, but I just don’t see the 
> trees for the forrest here.
> 
> Thanks
> Markus
>

Re: GRE IP6/IP6 not working as soon as pf is enabled

2022-01-15 Thread David Gwynne

On Sat, Jan 15, 2022 at 08:10:44PM +0100, Markus Wipp wrote:
> Hi all, 
> 
> This is my first mail to an OpenBSD list, so I hope I chose the correct one.
> 
> I???m trying to get a GRE tunnel in combination with pf working a few days now
> on my OpenBSD (OpenBSD 7.0 (GENERIC.MP) #232: Thu Sep 30 14:25:29 MDT 2021)
>  
> If I disable pf with pfctl -d the connection is working and I can ping.
> However as soon as I enable pf with pfctl -e the ping stops working (even 
> with a configuration that 
> should allow all traffic according my understanding)
> 
> The GRE interface looks like:
> 
> gre0: flags=8051 mtu 1476
>   index 44 priority 0 llprio 6
>   encap: vnetid none txprio payload rxprio packet
>   groups: gre
>   tunnel: inet6 2a02::yyy:zzz::1 --> 2a00:::::10 ttl 64 
> nodf ecn
>   inet6 fe80::20d:b9ff:fe44:ecdc%gre1051 -->  prefixlen 64 scopeid 0x2c
>   inet6 2a01:qqq::ss::2 -->  prefixlen 128
> 
> The simplified pf-Rule looks like:
> 
> pass
> pass on gre proto gre no state

Hi Markus,

My first impression is that you're confusing where to apply policy to
the encapsulated traffic. "pass on gre proto gre" implies you're
trying to pass GRE packets as they go over gre(4) interfaces, but
it's the unencapsulated packets that go over gre(4), and the GRE
encapsulated packets will go over your "underlay" or physical
interfaces, which looks like em0 according to tcpdump.

You can see that from the tcpdump output below. When you tcpdump
on gre0 you only see icmp6 packets. That's the same as what pf sees.
When you tcpdump on em0 you see the GRE packets, which again, is what pf
will see. Note that pf will only look at the first protocol inside an IP
packet (ie, TCP, UDP, GRE, etc), it won't let you filter inside GRE
packets.

Your pass rule should let everything work though. Those two rules are
your entire ruleset?

Something like this might work better:

# pass the GRE encapsulated traffic
pass inet6 proto gre
# let ping6 over gre(4) work
pass on gre inet6 proto icmp6

The bare "pass" rule not letting this work makes me feel like there's
more to this though.

Hope this helps,
dlg

> 
> tcpdump shows the following:
> 
> doas tcpdump -nvei gre0 ip6 and icmp6 or proto gre 
> tcpdump: listening on gre0, link-type LOOP
> 19:29:15.124113 2a01:qqq::ss::2 > 2a01:qqq::ss::1: icmp6: echo 
> request (id:9e45 seq:18) [icmp6 cksum ok] (len 64, hlim 64)
> 19:29:16.124438 2a01:qqq::ss::2 > 2a01:qqq::ss::1: icmp6: echo 
> request (id:9e45 seq:19) [icmp6 cksum ok] (len 64, hlim 64)
> 19:29:17.1248112a01:qqq::ss::2 > 2a01:qqq::ss::1: icmp6: echo request 
> (id:9e45 seq:20) [icmp6 cksum ok] (len 64, hlim 64)
> 
> and
> 
> doas tcpdump -nvei em0 ip6 and icmp6 or proto gre 
> tcpdump: listening on em0, link-type EN10MB
> 19:51:06.126497 00:0d:b9:44:ec:dc 34:81:c4:e0:4b:79 86dd 162: 
> 2a02::yyy:zzz::1 > 2a00:::::10: gre [] 86dd 
> 2a01:qqq::ss::2 > 2a01:qqq::ss::1: icmp6: echo request (id:9e45 
> seq:1329) (len 64, hlim 64) [flowlabel 0x367f] (len 108, hlim 64)
> 19:51:07.126815 00:0d:b9:44:ec:dc 34:81:c4:e0:4b:79 86dd 162: 
> 2a02::yyy:zzz::11 > 2a00:::::10: gre [] 86dd 
> 2a01:qqq::ss::2 > 2a01:qqq::ss::1: icmp6: echo request (id:9e45 
> seq:1330) (len 64, hlim 64) [flowlabel 0x367f] (len 108, hlim 64)
> 19:51:08.127252 00:0d:b9:44:ec:dc 34:81:c4:e0:4b:79 86dd 162: 
> 2a02::yyy:zzz::1 > 2a00:::::10: gre [] 86dd 
> 2a01:qqq::ss::2 > 2a01:qqq::ss::1: icmp6: echo request (id:9e45 
> seq:1331) (len 64, hlim 64) [flowlabel 0x367f] (len 108, hlim 64)
> 
> 
> And 
> 
> doas tcpdump -nvei pflog0 
> tcpdump: WARNING: snaplen raised from 116 to 160
> tcpdump: listening on pflog0, link-type PFLOG
> 19:55:03.962579 rule 0/(ip-option) [uid 0, pid 74650] pass in on em0: 
> 2a00:::::10 > 2a02::yyy:zzz::1: DSTOPT (type 0x04: len=1) gre 
> [] 86dd [|ip6] [flowlabel 0xa8f7b] (len 116, hlim 243)
> 19:55:04.964864 rule 0/(ip-option) [uid 0, pid 74650] pass in on em0: 
> 2a00:::::10 > 2a02::yyy:zzz::1: DSTOPT (type 0x04: len=1) gre 
> [] 86dd [|ip6] [flowlabel 0xa8f7b] (len 116, hlim 243)
> 19:55:05.963947 rule 0/(ip-option) [uid 0, pid 74650] pass in on em0: 
> 2a00:::::10 > 2a02::yyy:zzz::1: DSTOPT (type 0x04: len=1) gre 
> [] 86dd [|ip6] [flowlabel 0xa8f7b] (len 116, hlim 243)
> 
> 
> Thanks in advance for any hints on how to solve this issue
> 
> Best regards
> Markus
>

Re: Issues with veb/vport and vlan interactions

2021-12-27 Thread David Gwynne

On Sun, Dec 26, 2021 at 07:46:01AM +, Simon Baker wrote:
> Hi,
> 
> Struggling a bit debugging something, and hoping someone can point me in the 
> right direction.

ok. after staring at this for a while im pretty sure it's an actual bug
rather than a misconfiguration.

> I???ve got 4 physical intel nics, all configured as part of a veb bridge.  
> The veb bridge itself has two vports attached, one with an address and one 
> without:
> 
>   cat /etc/hostname.vport0 
>   inet 172.16.0.250 255.255.255.0
>   group trusted
>   up 
> 
>   cat /etc/hostname.vport1
>   group vlan-interface
>   link0

as an aside, link0 on a vport doesn't do anything.

>   up
> 
> The hostname.veb0 file contains this:
>   add em0
>   add em1
>   add em2
>   add em3
>   add vport0
>   add vport1
>   link0
>   up
> 
> This setup is working fine for all hosts on my main LAN, and everything is as 
> expected.  However I???ve tried and (partially) failed in adding some 
> VLAN???s to the veb.
> 
> For example, here???s one of the vlan configurations:
>   cat /etc/hostname.vlan210
>   inet 172.16.210.2 255.255.255.0 172.16.210.255 
>   parent vport1
>   vlan 210 
>   description "VLAN 210 - A/V & Media Devices???
>   up 
> 
> Note the following only discusses one VLAN, but the issue is present on all 
> of the configured VLANs.
> 
> From a host on the VLAN network, it can connect outbound to the internet 
> absolutely fine - but it cannot talk back to the main network.  Strangely, 
> running tcpdump on interfaces shows traffic moving as (possibly) expected - 
> but packets never seem to appear on the wire to the downstream host.
> 
> In the following example, Volumio is a host on the VLAN 210 as above, 
> attempting to send an ICMP echo request to a host on the main lan.  First up, 
> here???s a PF log showing the permitted packet:
> 
> Dec 25 20:41:13.342006 rule 86/(match) pass out on vport0: 172.16.210.13 > 
> 172.16.0.1: icmp: echo request
> 
> (Note, I still get the same issues even with disabling pf)
> 
> Next, here???s the packet on the vport1 interface from above:
> 
> 20:41:22.663129 dc:a6:32:4d:9a:4c fe:e1:ba:d3:54:a5 8100 102: 802.1Q vid 210 
> pri 1 volumio.av.kaizo.lan > nas.kaizo.lan: icmp: echo request (DF)
> 
> Now, here???s the packet on the vport0 interface:
> 20:41:22.663145 fe:e1:ba:d2:e4:93 68:05:ca:4a:7c:18 ip 98: 
> volumio.av.kaizo.lan > nas.kaizo.lan: icmp: echo request
> 
> However, this is where it stops.  I see no matching packet on the veb0 
> interface, nor do I see a packet egress on the physical em1 interface, to 
> which the host ???nas??? is connected to.  Obviously I don???t see the packet 
> on that host, either.
> 
> I???m a little perplexed as to what???s going on here - it???s almost as if 
> the veb doesn???t believe it???s responsible for this packet.  It seems to be 
> happily routing packets from the LAN to hosts on a VLAN, it???s just the 
> return traffic that never arrives.

you're right, the veb doesn't think it should handle the packet.

veb sets and clears a flag on packets going in and out of vport
interfaces as a sort of loop prevention mechanism. because vlan packets
are handled before veb can clear this flag, the packet ends up being
marked as inside veb when it goes through the network stack. when it
comes out the stack on a vport interface again, it gets dropped because
of this flag still being set.

there's a diff below that moves away from the flag to try and avoid this
problem. can you give it a go in your setup?

alternatively, i think you could use a separate veb per vlan, but
that's a lot of boilerplate...

> For completeness, below are output of ifconfig for the interfaces (edited).
> 
> Simon.
> 
> veb0: flags=9943
>index 12 llprio 3
>groups: veb
>em0 flags=3
>port 1 ifpriority 0 ifcost 0
>em1 flags=3
>port 2 ifpriority 0 ifcost 0
>em2 flags=3
>port 3 ifpriority 0 ifcost 0
>em3 flags=3
>port 4 ifpriority 0 ifcost 0
>vport0 flags=3
>port 19 ifpriority 0 ifcost 0
>vport1 flags=3
>port 20 ifpriority 0 ifcost 0
>Addresses (max cache: 100, timeout: 240):
>???snip???.
>68:05:ca:4a:7c:18 em1 0 flags=0<>
>???.snip???.
>fe:e1:ba:d2:e4:93 vport0 0 flags=0<>
>fe:e1:ba:d3:54:a5 vport1 0 flags=0<>
> 
> vport0: flags=8943 mtu 1500
>lladdr fe:e1:ba:d2:e4:93
>index 19 priority 0 llprio 3
>groups: vport trusted
>inet 172.xx.xx.250 netmask 0xff00 broadcast 172.16.0.255
> 
> vport1: flags=9943 mtu 
> 1500
>lladdr fe:e1:ba:d3:54:a5
>index 20 priority 0 llprio 3
>groups: vport vlan-interface
> 
> vlan210: flags=8843 mtu 1500
>lladdr fe:e1:ba:d3:54:a5
>description: VLAN 210 - A/V & Media

Re: Openbsd VMM with VLAN

2021-06-01 Thread David Gwynne

Hi Irshad,

Assuming I understand your layout correctly, you should be able to use 
hostname.if configurations files like the following:

$ cat hostname.em0:
up

$ cat hostname.vlan20
description "Trusted (L2+L3)"
vnetid 20 parent em0
inet aa.bb.cc.dd 255.255.255.0
up

$ cat hostname.vlan10:
description "IoT (L2)"
vnetid 10 parent em0
up

$ cat hostname.veb10
description "IoT bridge"
add vlan10
add vport10
up

$ cat hostname.vport10
description "IoT (L3)"
inet ee.bb.cc.dd 255.255.255.0
up

With the above, vlan10 on the wire will be connected using veb10 to the IP 
stack on your firewall on vport10. To have the virtual machine also plug into 
that VLAN 10 Ethernet segment, you can use veb10 as your "uplink" switch 
interface in vmm.conf.

dlg

> On 31 May 2021, at 05:44, Irshad  wrote:
> 
> Hi all 
> 
> 
> i have two Openbsd box Running Like Below one As Firewall and Another one As
> VMM
> With two VLAN's
> 
>OPENBSD_FIREWALL
> 
> IoT_AP  (VLAN10) . -VLAN10
>|--OpenWRT-em0---| ---pf --em1--Internet
>||- VLAN20 
> trusted_AP(VLAN20)  
> this Works fine  
> 
> 
> Another Separate OpenBSD Box for VM 
> 
> openbsd(vmGuest)---vether0---openbsdHost——NAT—em0--OpenBSD_FW--Internet
> 
> is it possible Add openbsd(vmguest) to VLAN10 network 
> 
> 
> this is MY vm config [HomeAssistance]
> 
> 
> 
> switch "uplink" {
>interface bridge1
> }
> vm "hass" {
>disable
>owner irshad
>memory 2G
>disk "/home/irshad/iso/disk.qcow2"
> 
>interface {
>switch "uplink"
>lladdr fe:e1:bb:01:01:01
>}
> }
> 
> 
>

Re: Home Assistant

2021-05-11 Thread David Gwynne




> On 11 May 2021, at 05:01, pas...@pascallen.nl wrote:
> 
> Dear David,
> 
> How do you start homeassistant after a reboot? Manually?

i have these scripts. the pexp in the rc script doesnt work, but i havent 
needed it to yet.

apathy$ cat /etc/rc.d/hass   
#!/bin/ksh

daemon="/opt/local/sbin/hass --daemon"
daemon_user="_hass"

pexp="/opt/hass/bin/hass"

. /etc/rc.d/rc.subr

rc_reload=NO

rc_cmd $1
apathy$ cat /opt/local/sbin/hass 
#!/bin/ksh

. /opt/hass/bin/activate

/opt/hass/bin/hass "$@"

Re: pf ipv6 source-routing 6.9

2021-05-10 Thread David Gwynne

> On 10 May 2021, at 8:05 pm, Bastien Durel  wrote:
> 
> Le samedi 08 mai 2021 à 12:07 +0200, Bastien Durel a écrit :
>> Le 08/05/2021 à 11:56, Stuart Henderson a écrit :
> Does it work if you use the syntax suggested in the upgrade
> notes
> for the example with "pass in on pppoe1 reply-to ..."?
> 
> 
 For incoming connections, I tried

 pass in on pppoe0 inet6 reply-to fe80::520f:80ff:fe65:8800%pppoe0
 keep state
 pass in on pppoe0 inet6 reply-to fe80::520f:80ff:fe65:8800 keep
 state
> 
> Hello,
> 
> Thanks to folks of #openbsd, I found out adding an explicit route to
> fe80::520f:80ff:fe65:8800 on pppoe0 make this work.
> Referencing fe80::520f:80ff:fe65:8800%pppoe0 in pf.conf results in a
> rule referencing fe80::520f:80ff:fe65:8800
> 
> pf.conf:
> pass in on pppoe0 inet6 reply-to fe80::520f:80ff:fe65:8800%pppoe0
> pfctl -s rules:
> pass in on pppoe0 inet6 all flags S/SA reply-to fe80::520f:80ff:fe65:8800
> 
> hostname.pppoe0:
> !/sbin/route add -inet6 fe80::520f:80ff:fe65:8800 -ifp pppoe0 fe80::%pppoe0
> 
> This make pf able to route to the correct interface.

You're right, pf isn't very good at handling link-local v6 addresses. This is 
annoying now that route-to uses addresses as it's argument if you want to move 
ipv6 packets toward a host with a link local address.

In this situation the least worst way to cope with the problem for now is to 
use route-to (pppoe0:0). This should work because route-to doesn't do any local 
address checks on the destination address it resolves. Once it looks up the 
local address as the direction to send the packet, it should put it straight 
out pppoe0. ppp as a tunnel interface has no address resolution protocol, it 
just encapsulates the packet it is given and sends it on its way.

route-to also takes a destination address as an argument, not a gateway 
address. If dhcp6c sets up a route to some global address that you know about 
(I'm not sure this is a thing but it might be), you can use that global address 
as the argument to route-to and it will send it in the right direction.

dlg

> Regards,
> 
> -- 
> Bastien
>

Re: virtual cluster with rdomain(4)

2021-05-10 Thread David Gwynne

he: 100, timeout: 240):
fe:e1:ba:d2:4a:be vport1 16 flags=0<>
fe:e1:ba:d3:17:a0 vport2 16 flags=0<>
ix#

dlg

> 
> thanks
> Thomas
> 
> On Mon, 10 May 2021 at 08:10, David Gwynne  wrote:
> >
> > Hi Thomas,
> >
> > I'd give this a go with vport(4) interfaces instead of vether(4), and
> join them all together at layer 2 by adding them to a single veb(4).
> >
> > Cheers,
> > dlg
> >
> > > On 10 May 2021, at 03:04, Thomas Huber  wrote:
> > >
> > > Hi misc,
> > >
> > > I wanted to tinker with the cluster manager sysutils/nomad but
> > > unfortunately I??ve no spare cluster for tinkering...
> > >
> > > So I had the idea of utilizing OpenBSDs outstanding
> > > possibilities for network isolation to create a
> > > virtual cluster on my VM at openbsd.amsterdam.
> > >
> > > I had different ideas to achieve it but nothing worked so far.
> > > So I'd describe my first approach because I think this is the
> > > most OpenBSD idiomatic one:
> > >
> > > I created 5 vether[0-4] devices, everyone in its own rdomain [0-4]
> > > and assigned every device its own inet address space 10.10.[0-4].1/24
> > >
> > > I also set the 10.10.[0-4].1 as default route in each rtable.
> > >
> > > Now I learned that pf(4) is needed to route between this 5 rdomains
> > > but after several attempts I've no clue how this could be defined.
> > >
> > > Actually I wanted rdomain 0 to work as hub for all rdomains >0.
> > > Maybe someone can hint me in the right direction
> > >
> > > regards
> > > Thomas (host of the u2k20-hackathon, if someone remembers ;-)
> > >
> > > some further listings if my description above is unclear:
> > >
> > >
> > > ud$ ifconfig vether
> > > vether0: flags=8843 mtu 1500
> > > lladdr fe:e1:ba:d7:cc:16
> > > index 23 priority 0 llprio 3
> > > groups: vether
> > > media: Ethernet autoselect
> > > status: active
> > > inet 10.10.0.1 netmask 0xff00 broadcast 10.255.255.255
> > >
> > > vether1: flags=8843 rdomain 1
> mtu
> > > 1500
> > > lladdr fe:e1:ba:d8:73:32
> > > index 24 priority 0 llprio 3
> > > groups: vether
> > > media: Ethernet autoselect
> > > status: active
> > > inet 10.10.1.1 netmask 0xff00 broadcast 10.255.255.255
> > >
> > > vether2: flags=8843 rdomain 2
> mtu
> > > 1500
> > > lladdr fe:e1:ba:d9:bd:e8
> > > index 26 priority 0 llprio 3
> > > groups: vether
> > > media: Ethernet autoselect
> > > status: active
> > > inet 10.10.2.1 netmask 0xff00 broadcast 10.255.255.255
> > >
> > > vether3: flags=8843 rdomain 3
> mtu
> > > 1500
> > > lladdr fe:e1:ba:da:07:4d
> > > index 28 priority 0 llprio 3
> > > groups: vether
> > > media: Ethernet autoselect
> > > status: active
> > > inet 10.10.3.1 netmask 0xff00 broadcast 10.255.255.255
> > >
> > > vether4: flags=8843 rdomain 4
> mtu
> > > 1500
> > > lladdr fe:e1:ba:db:31:c8
> > > index 30 priority 0 llprio 3
> > > groups: vether
> > > media: Ethernet autoselect
> > > status: active
> > > inet 10.10.4.1 netmask 0xff00 broadcast 10.255.255.255
> > >
> > > ud$ netstat -R
> > > Rdomain 0
> > >  Interfaces: lo0 vio0 enc0 pflog0 vether0
> > >  Routing tables: 0 71
> > >
> > > Rdomain 1
> > >  Interfaces: vether1 lo1
> > >  Routing table: 1
> > >
> > > Rdomain 2
> > >  Interfaces: vether2 lo2
> > >  Routing table: 2
> > >
> > > Rdomain 3
> > >  Interfaces: vether3 lo3
> > >  Routing table: 3
> > >
> > > Rdomain 4
> > >  Interfaces: vether4 lo4
> > >  Routing table: 4

Re: Home Assistant

2021-05-10 Thread David Gwynne

ive been running hass on openbsd for a while now, and just did a new
install on 6.9 for my boss on the weekend.

i set up a _hass user for it to run as, and gave it /opt/hass:

hass$ getent passwd _hass
_hass:*:2000:2000:Home Assistant:/opt/hass:/sbin/nologin
hass$ getent group 2000
_hass:*:2000
hass$ ls -ld /opt/hass
drwxr-xr-x  8 _hass  _hass  512 May  8 22:35 /opt/hass

i installed mosquitto, python3.8, py3-virtualenv, py3-pip,
py3-cryptography, py3-Pillow, and py3-zeroconf from ports. then
as the _hass users i set up a venv in /opt/hass with virtualenv
--system-site-packages /opt/hass, did the . /opt/hass/bin/activate
thing, then ran pip install homeassistant.

that got me far enough stuff to be able to start home assistant. you're
on your own after this.

good luck.

dlg

On Sat, May 08, 2021 at 06:53:54PM +0200, pas...@pascallen.nl wrote:
> Dear all,
> 
> What would be the best way to install HASS on Openbsd?
> Containers are a nogo?
> 
> Run it in virtual env from python?
> 
> Any Howto on the subject with Openbsd?
> 
> 
> Currently I got it running as from the website with the "core" version.
> But a startup script which runs with a non-root user is where I get
> stuck.
> 
> 
> 
> 
> -- 
> Met vriendelijke groet,
> 
> Pascal Huisman
> 
> 
> Fundamentally, there may be no basis for anything.
>

Re: virtual cluster with rdomain(4)

2021-05-10 Thread David Gwynne

Hi Thomas,

I'd give this a go with vport(4) interfaces instead of vether(4), and join them 
all together at layer 2 by adding them to a single veb(4).

Cheers,
dlg

> On 10 May 2021, at 03:04, Thomas Huber  wrote:
> 
> Hi misc,
> 
> I wanted to tinker with the cluster manager sysutils/nomad but
> unfortunately I´ve no spare cluster for tinkering...
> 
> So I had the idea of utilizing OpenBSDs outstanding
> possibilities for network isolation to create a
> virtual cluster on my VM at openbsd.amsterdam.
> 
> I had different ideas to achieve it but nothing worked so far.
> So I'd describe my first approach because I think this is the
> most OpenBSD idiomatic one:
> 
> I created 5 vether[0-4] devices, everyone in its own rdomain [0-4]
> and assigned every device its own inet address space 10.10.[0-4].1/24
> 
> I also set the 10.10.[0-4].1 as default route in each rtable.
> 
> Now I learned that pf(4) is needed to route between this 5 rdomains
> but after several attempts I've no clue how this could be defined.
> 
> Actually I wanted rdomain 0 to work as hub for all rdomains >0.
> Maybe someone can hint me in the right direction
> 
> regards
> Thomas (host of the u2k20-hackathon, if someone remembers ;-)
> 
> some further listings if my description above is unclear:
> 
> 
> ud$ ifconfig vether
> vether0: flags=8843 mtu 1500
> lladdr fe:e1:ba:d7:cc:16
> index 23 priority 0 llprio 3
> groups: vether
> media: Ethernet autoselect
> status: active
> inet 10.10.0.1 netmask 0xff00 broadcast 10.255.255.255
> 
> vether1: flags=8843 rdomain 1 mtu
> 1500
> lladdr fe:e1:ba:d8:73:32
> index 24 priority 0 llprio 3
> groups: vether
> media: Ethernet autoselect
> status: active
> inet 10.10.1.1 netmask 0xff00 broadcast 10.255.255.255
> 
> vether2: flags=8843 rdomain 2 mtu
> 1500
> lladdr fe:e1:ba:d9:bd:e8
> index 26 priority 0 llprio 3
> groups: vether
> media: Ethernet autoselect
> status: active
> inet 10.10.2.1 netmask 0xff00 broadcast 10.255.255.255
> 
> vether3: flags=8843 rdomain 3 mtu
> 1500
> lladdr fe:e1:ba:da:07:4d
> index 28 priority 0 llprio 3
> groups: vether
> media: Ethernet autoselect
> status: active
> inet 10.10.3.1 netmask 0xff00 broadcast 10.255.255.255
> 
> vether4: flags=8843 rdomain 4 mtu
> 1500
> lladdr fe:e1:ba:db:31:c8
> index 30 priority 0 llprio 3
> groups: vether
> media: Ethernet autoselect
> status: active
> inet 10.10.4.1 netmask 0xff00 broadcast 10.255.255.255
> 
> ud$ netstat -R
> Rdomain 0
>  Interfaces: lo0 vio0 enc0 pflog0 vether0
>  Routing tables: 0 71
> 
> Rdomain 1
>  Interfaces: vether1 lo1
>  Routing table: 1
> 
> Rdomain 2
>  Interfaces: vether2 lo2
>  Routing table: 2
> 
> Rdomain 3
>  Interfaces: vether3 lo3
>  Routing table: 3
> 
> Rdomain 4
>  Interfaces: vether4 lo4
>  Routing table: 4

Re: Working with encapsulated traffic using PF (pass incoming IPv4 from IPv6 gif tunnel)

2021-04-14 Thread David Gwynne



> On 9 Apr 2021, at 18:55, Martin  wrote:
> 
> Hello list,
> 
> I have working IPv4 OpenBSD router. There are no problems with native IPv4 
> and IPv6 traffic filtering/redirecting at all.
> 
> Now stuck with filtering IPv4 traffic encapsulated in IPv6 tunnel using gif 
> interface.
> 
> IPv6 interface is tun0 which has assigned unique IPv6 address, and gif0 has 
> the same unique IPv6 as tun0 with wrapped IPv4 into IPv6 as shows in configs.
> 
> The same configuration from the opposite side, except IPv4 and IPv6 source 
> and destination addresses reversed to make a tunnel.
> 
> I'm not sure if I needed to use a bridge between tun0 and gif0 to have it 
> working.
> 
> Looking for appropriate PF filtering rule to pass IPv4 encapsulated traffic 
> appearing on tun0 and blocks by "block all" PF rule for some reason.
> 
> Any ideas welcome.
> 
> === Side-a ===
> 
> # cat /etc/hostname.gif0
> # gif0
> up
> description 'IPv4 over IPv6 tunnel'
> # tunnel [src IPv6] [dst IPv6]
> tunnel :::::18b5 :::::a503
> inet alias 10.190.0.1
> dest 10.190.0.2
> 
> # ifconfig tun0
> tun0: flags=8051 mtu 1500
>index 44 priority 0 llprio 3
>groups: tun
>status: active
>inet6 fe80::5054:ffc:fe04:f824%tun0 ->  prefixlen 64 scopeid 0x2c
>inet6 :::::18b5 ->  prefixlen 48
> 
> === Side-b ===
> 
> # cat /etc/hostname.gif0
> # gif0
> up
> description 'IPv4 over IPv6 tunnel'
> # tunnel [src IPv6] [dst IPv6]
> tunnel :::::a503 :::::18b5
> inet alias 10.190.0.2
> dest 10.190.0.1
> 
> # ifconfig tun0
> tun0: flags=8051 mtu 1500
>index 44 priority 0 llprio 3
>groups: tun
>status: active
>inet6 fe80::2a15:f3af:fefb:a3b0%tun0 ->  prefixlen 64 scopeid 0x2c
>inet6 :::::a503 ->  prefixlen 48
> 

Hi Martin,

bridge(4) only works with Ethernet interfaces, there is no equivalent to 
bridge(4) for tunnels. I don't think that's related or necessary for solving 
your problem though.

Without a look at your ipv6 routing table it's hard to tell what could be 
happening here. My first impression is that your routers don't have routes for 
the IPv6 endpoints over the tun0 interfaces. For this to work, I'd expect to 
see something like this in your tun0 output:

=== Side-a ===

# ifconfig tun0
tun0: flags=8051 mtu 1500
   index 44 priority 0 llprio 3
   groups: tun
   status: active
   inet6 fe80::5054:ffc:fe04:f824%tun0 ->  prefixlen 64 scopeid 0x2c
   inet6 :::::18b5 -> :::::a503 prefixlen 
128

and:

=== Side-b ===

# ifconfig tun0
tun0: flags=8051 mtu 1500
   index 44 priority 0 llprio 3
   groups: tun
   status: active
   inet6 fe80::2a15:f3af:fefb:a3b0%tun0 ->  prefixlen 64 scopeid 0x2c
   inet6 :::::a503 -> :::::18b5 prefixlen 
128

This isn't strictly necessary though, the important thing is that the route to 
the dst IPv6 endpoint is over tun0. You should be able to check if that is the 
case with "route get [dst IPv6]" and looking for tun0 in the "interface:" line. 
You could also be able to ping6 between the IPv6 tunnel endpoints too. If ping6 
isn't working, then I wouldn't expect gif traffic to work either.

Cheers,
dlg

Re: divert with rdr-to not working properly

2021-04-07 Thread David Gwynne

On Mon, Apr 05, 2021 at 09:51:53AM +0300, Hakan SARIMAN wrote:
> Hello Misc,
> 
> 
> I think divert-packet feature with NAT/NAPT is broken.
> 
> I can not reach to web server when I use divert-packet with rdr-to.
> 
> Is this a known bug or a new issue?

There's no other options? Just those two?

I think it's been around for a long time, but no one's hurt themselves
with it because they haven't combined nat/rdr with divert-packet
yet.

I believe the diff below will fix the bug. There's some discussion going
on behind the scenes about whether this is the right fix though.

> 
> When I use divert-packet + rdr-to here is the situation:
> 
> 
> # MY PF RULES
> 
> pass in log quick on pppoe0 inet proto tcp from any to (pppoe0:0) port 81
> rdr-to 10.10.12.27 port 81
> 
> pass out log quick on vport12 inet proto tcp from any to 10.10.12.27 port
> 81 divert-packet port 700

Index: pf.c
===
RCS file: /cvs/src/sys/net/pf.c,v
retrieving revision 1.1112
diff -u -p -r1.1112 pf.c
--- pf.c23 Feb 2021 11:43:40 -  1.1112
+++ pf.c5 Apr 2021 10:16:31 -
@@ -6848,8 +6848,10 @@ pf_test(sa_family_t af, int fwdir, struc
if ((*m0)->m_pkthdr.pf.flags & PF_TAG_GENERATED)
return (PF_PASS);
 
-   if ((*m0)->m_pkthdr.pf.flags & PF_TAG_DIVERTED_PACKET)
+   if ((*m0)->m_pkthdr.pf.flags & PF_TAG_DIVERTED_PACKET) {
+   CLR((*m0)->m_pkthdr.pf.flags, PF_TAG_DIVERTED_PACKET);
return (PF_PASS);
+   }
 
if ((*m0)->m_pkthdr.pf.flags & PF_TAG_REFRAGMENTED) {
(*m0)->m_pkthdr.pf.flags &= ~PF_TAG_REFRAGMENTED;

Re: What determines source IP of traffic from OpenBSD box ?

2021-02-28 Thread David Gwynne

On Sun, Feb 28, 2021 at 01:17:01PM +0100, Rachel Roch wrote:
> 
> 
> 
> 28 Feb 2021, 11:28 by s...@spacehopper.org:
> 
> > On 2021/02/28 11:46, Rachel Roch wrote:
> >
> >> Thank you all for the suggestions, I am currently testing a few of them.
> >>
> >> Incase it makes any difference, the underlying problem I have is I have 
> >> two firewalls with BGP upstreams, one acting as primary, one as standby.?? 
> >> So the problem I am seeing is the age-old problem of asymmetric traffic to 
> >> the secondary firewall meaning pkg_add on the secondary doesn't work.
> >>
> >
> > You can't just get two sessions from your upstreams so they can both be
> > active rather than one in standby?
> >
> 
> Maybe my wording is a little off.
> 
> I do have independent sessions from FW1 and FW2 to upstream routers.
> 
> The problem, I suspect, is more to do with overlapping of IP ranges being 
> advertised to upstreams, and hence traffic never making it back to FW2 
> because FW1 picks it up, hence the desire to have an effective way to tell 
> OpenBSD "send all localhost originating traffic from lo2 because the IPs on 
> lo2 are exclusive to that host".

I have a situation like that at work which I solved using the following
rules:

# let us talk to things
  match out on vlan363 to !vlan363:network !received-on any nat-to lo1
  match out on vlan364 to !vlan364:network !received-on any nat-to lo1
  pass out !received-on any

vlan363 and vlan364 are the links I use to talk to the rest of the
world.

There may be a less worse way to do that with the routing table now
though.

Re: seeing carp interface state change for unknown reason ; cluestick hunting

2021-02-01 Thread David Gwynne




> On 1 Feb 2021, at 6:02 pm, Bryan Stenson  wrote:
> 
> Hi all -
> 
> I'm trying to setup a pair of ERL3 octeon routers in master/standby
> mode via carp/pfsync to route traffic from my internal lan to the
> internet.  I've seen strange behavior wrt carp on these machines, so
> in an attempt to reduce the problem, I've removed one completely.
> 
> Even with only a single box (ERL3-01) on the network configured as a
> carp member, the carp interface state periodically changes (as seen
> from ifstated(8)).
> 
> I'm wondering if disconnecting the other ERL3 device is a valid isolated test.
> 1.  Will/might this cause issues with the carp device, as it cannot
> determine state from any other host?

If carp state flaps around while it is the only device on the network, that 
would imply the parent device is flapping around.

> 2.  Will/might this cause issues as it cannot send/receive pfsync
> updates (the other node is disconnected).

pfsync doesn't really care about carp state.

> 3.  Is there something else in my setup causing carp to fail here?

I'd be running "route monitor" and looking for link state changes on the carp 
parent interface.

> 4.  Could this be hardware/temperature related to this ERL3?  Wouldn't
> I see an additional error in dmesg if the physical device (cnmac2)
> failed periodically?
> 
> I'd appreciate any pointers here...I feel like I'm missing something dumb.

My first ideas are above. If it turns out the carp parent is stable we can try 
come up with something else.

dlg

> 
> Thanks in advance.
> 
> Bryan
> 
> Here are some of my configs.  If I've missed including something
> critical to help describe my setup, please let me know and I'll add
> it.
> 
> ## Help me OBSD-Misc Kenobi.  You're my only hope. ##
> 
> erl3-01# uname -a
> OpenBSD erl3-01.siliconvortex.com 6.8 GENERIC#522 octeon
> 
> erl3-01# dmesg
> ...
> carp1: state transition: BACKUP -> MASTER
> carp1: state transition: BACKUP -> MASTER
> carp1: state transition: BACKUP -> MASTER
> carp1: state transition: BACKUP -> MASTER
> carp1: state transition: BACKUP -> MASTER
> carp1: state transition: BACKUP -> MASTER
> 
> erl3-01# tail mbox
> Mon, 1 Feb 2021 06:49:26 + (UTC)
> From: Charlie Root 
> Date: Mon, 1 Feb 2021 06:49:25 + (UTC)
> To: root@localhost
> Subject: carp master changed
> Message-ID: <515eb74cff427...@erl3-01.siliconvortex.com>
> Status: RO
> 
> master is now erl3-01.siliconvortex.com
> 
> 
> erl3-01# sysctl -a | grep carp
> net.inet.carp.allow=1
> net.inet.carp.preempt=1
> net.inet.carp.log=2
> 
> erl3-01# cat /etc/hostname.carp1
> #carp for lan side
> 192.168.122.1/23 carpdev vlan100 vhid 1 pass somethinglongandsecret
> 
> erl3-01# cat /etc/hostname.vlan100
> vnetid 100 parent cnmac2
> up
> 
> erl3-01# cat /etc/hostname.cnmac2
> inet 192.168.1.253 255.255.254.0
> 
> erl3-01# cat /etc/hostname.pfsync0
> up syncdev cnmac1
> 
> erl3-01# cat /etc/hostname.cnmac1
> inet 10.10.200.1 255.255.255.252
> 
> erl3-01# cat /etc/ifstated.conf
> # Initial State
> init-state auto
> 
> # Macros
> if_carp_up="carp1.link.up"
> if_carp_down="!carp1.link.up"
> 
> state auto {
>  if $if_carp_up {
>set-state master
>  }
> 
>  if $if_carp_down {
>set-state backup
>  }
> }
> 
> state master {
>  init {
>run "echo master is now `hostname` | mail -s 'carp master changed'
> root@localhost"
> }
> 
>  if $if_carp_down {
>set-state backup
>  }
> }
> 
> state backup {
>  init {
>run "echo backup is now `hostname` | mail -s 'carp master changed
> root@localhost"
>  }
> 
>  if $if_carp_up {
>set-state master
>  }
> }
> 
> erl3-01# cat /etc/pf.conf
> # adopted from https://www.openbsd.org/faq/pf/example1.html
> wan_dev = cnmac0
> lan_dev = cnmac2
> carp_dev = vlan100
> pfsync_dev = cnmac1
> table  { 0.0.0.0/8 10.0.0.0/8 127.0.0.0/8 169.254.0.0/16 \
>172.16.0.0/12 192.0.0.0/24 192.0.2.0/24 224.0.0.0/3 \
>192.168.0.0/16 198.18.0.0/15 198.51.100.0/24\
>203.0.113.0/24 }
> 
> # carp
> pass quick on $lan_dev proto carp keep state (no-sync)
> 
> # pfsync
> pass quick on $pfsync_dev proto pfsync keep state (no-sync)
> 
> set block-policy drop
> set loginterface $wan_dev
> set skip on lo0
> 
> match in all scrub (no-df random-id max-mss 1440)
> 
> # redirect DNS queries to localhost
> pass in quick on { $carp_dev $lan_dev } proto { udp tcp } from any to
> any port domain rdr-to 192.168.1.253 port domain
> 
> # NAT to the world
> match out on $wan_dev inet from !($wan_dev:network) to any nat-to ($wan_dev:0)
> 
> antispoof quick for { $wan_dev }
> 
> # martians
> block in quick on $wan_dev from  to any
> block return out quick on $wan_dev from any to 
> 
> block all
> 
> # manage buffer bloat
> queue outq on $wan_dev flows 1024 bandwidth 3M max 3M qlimit 1024 default
> queue inq on $lan_dev flows 1024 bandwidth 45M max 45M qlimit 1024 default
> 
> pass out quick inet
> 
> pass in on { $carp_dev $lan_dev } inet
>

Re: Switching from trunk(4) to aggr(4)

2020-12-15 Thread David Gwynne

On Tue, Dec 15, 2020 at 06:43:12PM -0500, Daniel Jakots wrote:
> On Tue, 15 Dec 2020 14:30:16 +1000, David Gwynne 
> wrote:
> 
> > Can you try tcpdump -p -veni em0 -D in and see if any LACP packets
> > appear to come in on the port? If not, can you remove the -p and see
> > if em0 starts to work?
> > 
> > There are two main differences between how aggr(4) and trunk(4)
> > works. The first you've already found, which is that trunk(4) uses
> > the address from one of the ports it's given, while aggr(4) generates
> > one when it's created. The second difference is that trunk(4) makes
> > member ports promisc, while aggr(4) tries to be a lot more precise
> > and takes care to program the ports properly. This means that in your
> > environment em(4) has to support changing it's MAC address to the one
> > provided by aggr(4), and it has to support joining multicast groups
> > properly, including the one that LACP packets are sent to.
> > 
> > tcpdump with -p means that it won't make the interface promiscuous.
> > If you don't see LACP packets come in while the port is promisc, that
> > means the multicast filter isn't working properly. It should start
> > working if you're running tcpdump without -p on the em(4) ports, or
> > on aggr(4) itself.
> 
> 
> Thanks for your reply!
> 
> Here's what I did (spoiler alert, I couldn't get aggr0 to work):
> 
> I switched back the hostname files, and rebooted.
> 
> During boot:
> 
> starting network
> aggr0 em0 trunkport: creating port
> aggr0 em0 mux: BEGIN (BEGIN) -> DETACHED
> aggr0 em0 rxm: BEGIN (BEGIN) -> INITIALIZE
> aggr0 em0 rxm: INITIALIZE (UCT) -> PORT_DISABLED
> aggr0 em1 trunkport: creating port
> aggr0 em1 mux: BEGIN (BEGIN) -> DETACHED
> aggr0 em1 rxm: BEGIN (BEGIN) -> INITIALIZE
> aggr0 em1 rxm: INITIALIZE (UCT) -> PORT_DISABLED
> aggr0 em2 trunkport: creating port
> aggr0 em2 mux: BEGIN (BEGIN) -> DETACHED
> aggr0 em2 rxm: BEGIN (BEGIN) -> INITIALIZE
> aggr0 em2 rxm: INITIALIZE (UCT) -> PORT_DISABLED
> vlan10: no linkaggr0 em0 rxm: PORT_DISABLED (port_enabled) ->
> EXPIRED .aggr0 em2 rxm: PORT_DISABLED (port_enabled) -> EXPIRED
> aggr0 em1 rxm: PORT_DISABLED (port_enabled) -> EXPIRED
> ..aggr0 em0 rxm: EXPIRED (current_while_timer expired) -> DEFAULTED
> aggr0 em2 rxm: EXPIRED (current_while_timer expired) -> DEFAULTED
> aggr0 em1 rxm: EXPIRED (current_while_timer expired) -> DEFAULTED
> ... sleeping
> 
> root@pancake:~# tcpdump -p -veni em0 -D in
> tcpdump: listening on em0, link-type EN10MB
> 18:04:03.996369 80:56:f2:b7:9c:09 ff:ff:ff:ff:ff:ff 8100 60: 802.1Q vid 70 
> pri 1 arp who-has 10.70.70.254 tell 10.70.70.101
> 18:04:04.016123 00:17:10:8e:44:a5 ff:ff:ff:ff:ff:ff 8100 64: 802.1Q vid 10 
> pri 1 arp who-has 24.48.69.20 tell 24.48.69.1
> 18:04:04.034874 00:17:10:8e:44:a5 ff:ff:ff:ff:ff:ff 8100 64: 802.1Q vid 10 
> pri 1 arp who-has 24.48.69.109 tell 24.48.69.1
> 
> (vlan10 is my uplink to my isp's modem), I didn't have anything but
> those arp who-has.
> 
> root@pancake:~# ifconfig aggr0 -> still no carrier
> 
> root@pancake:~# tcpdump -veni em0 -D in
> tcpdump: listening on em0, link-type EN10MB
> 18:05:11.247455 52:54:00:06:aa:01 00:0d:b9:43:9f:fc 8100 1423: 802.1Q vid 20 
> pri 1 10.10.10.44.5638 > 198.48.202.251.25826: udp 1377 (ttl 64, id 2495, len 
> 1405)
> 18:05:11.248427 52:54:00:06:aa:01 00:0d:b9:43:9f:fc 8100 1390: 802.1Q vid 20 
> pri 1 10.10.10.44.5638 > 198.48.202.251.25826: udp 1344 (ttl 64, id 47470, 
> len 1372)
> 18:05:11.249478 52:54:00:06:aa:01 00:0d:b9:43:9f:fc 8100 1424: 802.1Q vid 20 
> pri 1 10.10.10.44.5638 > 198.48.202.251.25826: udp 1378 (ttl 64, id 57431, 
> len 1406)
> 18:05:11.570690 00:17:10:8e:44:a5 ff:ff:ff:ff:ff:ff 8100 64: 802.1Q vid 10 
> pri 1 arp who-has 184.161.78.225 tell 184.161.78.1
> 18:05:11.586920 00:17:10:8e:44:a5 ff:ff:ff:ff:ff:ff 8100 64: 802.1Q vid 10 
> pri 1 arp who-has 192.222.131.28 tell 192.222.131.1
> 18:05:12.050180 00:17:10:8e:44:a5 ff:ff:ff:ff:ff:ff 8100 64: 802.1Q vid 10 
> pri 1 arp who-has 24.48.76.202 tell 24.48.76.1
> 
> nothing else than those udp packets (my collectd setup) and the
> arp who-has
> 
> root@pancake:~# ifconfig aggr0 -> still no carrier
> 
> At that point I thought "sthen asked me to try to reboot the switch,
> let's do it now" and shortly after I got in my console
> aggr0 em0 rxm: DEFAULTED (!port_enabled) -> PORT_DISABLED
> aggr0 em1 rxm: DEFAULTED (!port_enabled) -> PORT_DISABLED   
> aggr0 em2 rxm: DEFAULTED (!port_enabled) -> PORT_DISABLED
> aggr0 em2 rxm: PORT_DISABLED (port_enabled) -> EXPIRED   
> aggr0 em1

Re: Switching from trunk(4) to aggr(4)

2020-12-14 Thread David Gwynne

> On 14 Dec 2020, at 08:40, Daniel Jakots  wrote:
> 
> On Sun, 13 Dec 2020 20:34:35 - (UTC), Stuart Henderson
>  wrote:
> 
>> On 2020-12-12, Daniel Jakots  wrote:
>>> I've been using a LACP trunk on my apu (with the three em(4)). On
>>> top of which I have some vlans. I've been doing that for years and
>>> it's working fine.  
>> 
>> I used load-balancing trunk on APU before but stopped when I came to
>> the conclusion that APU running OpenBSD wasn't going to push more
>> than 1Gbps anyway.. (I use failover way more than any type of load
>> balancing)
> 
> Yes but:
> - the three cables between the switch and the APU looks beautiful
> - I don't have to care which if is em0 and which if is em2. Just plug
>  everything.
> :)
> 
>> I don't see anything on the switch side I could change, and the log I
>> have is merely the ports going up or down when I reboot.
>> 
>>> Any idea why aggr(4) stays in no carrier status?  
>> 
>> Do you get any clues from "ifconfig aggr0 debug"?
> 
> I just tried
> # ifconfig aggr0 debug
> # dmesg
> 
> # ifconfig aggr0 down
> # ifconfig aggr0 up
> # ifconfig aggr0 # checked the debug flag was still there
> # dmesg
> 
> 
> I also looked at /var/log/message to be save, but nothing relevant.
> 
>> What does the lacp status look like on the switch? (or does it just
>> say 'up' or something and not really have any status?)
> 
> It doesn't say anything about the lacp, it just says the individual
> ports are going up or down (which is normal since I'm rebooting the apu
> to apply the network config change).

Can you try tcpdump -p -veni em0 -D in and see if any LACP packets appear to 
come in on the port? If not, can you remove the -p and see if em0 starts to 
work?

There are two main differences between how aggr(4) and trunk(4) works. The 
first you've already found, which is that trunk(4) uses the address from one of 
the ports it's given, while aggr(4) generates one when it's created. The second 
difference is that trunk(4) makes member ports promisc, while aggr(4) tries to 
be a lot more precise and takes care to program the ports properly. This means 
that in your environment em(4) has to support changing it's MAC address to the 
one provided by aggr(4), and it has to support joining multicast groups 
properly, including the one that LACP packets are sent to.

tcpdump with -p means that it won't make the interface promiscuous. If you 
don't see LACP packets come in while the port is promisc, that means the 
multicast filter isn't working properly. It should start working if you're 
running tcpdump without -p on the em(4) ports, or on aggr(4) itself.

Cheers,
dlg

Re: dhclient on carp

2020-07-23 Thread David Gwynne




> On 23 Jul 2020, at 22:28, Guy Godfroy  wrote:
> 
> Doesn't work better.
> I guess Sebastian is right, carp has to be assigned an IP to come up.

yeah, i just read the code a bit. they have to be able to communicate to be 
able to elect which one is the active and which is the backup. i suggest using 
an address like one in 169.254.x.y/16 so the carps can elect.

> 
> Le 23/07/2020 à 03:15, David Gwynne a écrit :
>>> On 22 Jul 2020, at 22:59, Guy Godfroy  wrote:
>>> 
>>> Hello,
>>> 
>>> So I read in 6.7 release note that it's finally possible to use dhclient on 
>>> CARP interface. That's great news.
>>> 
>>> However, I'm not sure how to use it on a hostname.if file. I tried to 
>>> replace inet instruction directly with dhcp:
>>> 
>>>dhcp vhid 11 carpdev em1 pass  description "test"
>>> 
>>> 
>>> But that didn't do the trick: at boot time, none of my nodes carp were in 
>>> master state so dhclient didn't manage to get any lease.
>>> 
>>> So I have first to give a static IP to my carp in order to activate it, and 
>>> only then trigger dhcp:
>>> 
>>>inet [...] vhid 11 carpdev em1 pass  description "test"
>>> 
>>>dhcp
>>> 
>>> It doesn't feel right. Is there a better way to do this?
>> hostname.if0 lines don't have to all be address configurations. generally 
>> netstart just passes the statements directly to ifconfig.
>> does something like the following work in hostname.carp0?
>> description "test"
>> vhid 11 carpdev em1 pass 
>> dhcp
>> dlg
>

Re: dhclient on carp

2020-07-22 Thread David Gwynne




> On 22 Jul 2020, at 22:59, Guy Godfroy  wrote:
> 
> Hello,
> 
> So I read in 6.7 release note that it's finally possible to use dhclient on 
> CARP interface. That's great news.
> 
> However, I'm not sure how to use it on a hostname.if file. I tried to replace 
> inet instruction directly with dhcp:
> 
>dhcp vhid 11 carpdev em1 pass  description "test"
> 
> 
> But that didn't do the trick: at boot time, none of my nodes carp were in 
> master state so dhclient didn't manage to get any lease.
> 
> So I have first to give a static IP to my carp in order to activate it, and 
> only then trigger dhcp:
> 
>inet [...] vhid 11 carpdev em1 pass  description "test"
> 
>dhcp
> 
> It doesn't feel right. Is there a better way to do this?

hostname.if0 lines don't have to all be address configurations. generally 
netstart just passes the statements directly to ifconfig.

does something like the following work in hostname.carp0?

description "test"
vhid 11 carpdev em1 pass 
dhcp

dlg

Re: non-checksummed UDP packets

2020-07-20 Thread David Gwynne




> On 20 Jul 2020, at 05:30, Stuart Henderson  wrote:
> 
> On 2020-07-19, obs...@loopw.com  wrote:
>> 
>>> Is this normal?  
>> 
>> Checksum is OPTIONAL in UDP, not required.  This is covered in RFC 768.
> 
> For IPv4, anyway. It's required for v6.

Or is it?

https://tools.ietf.org/html/rfc6935

Re: using aggr interface instead of trunk

2020-05-19 Thread David Gwynne




> On 14 May 2020, at 4:22 pm, mabi  wrote:
> 
> Hi Iain,
> 
> ‐‐‐ Original Message ‐‐‐
> On Wednesday, May 13, 2020 7:55 PM, Iain R. Learmonth  wrote:
> 
>> More details are at:https://marc.info/?l=openbsd-cvs=156229058006706=2
> 
> I actually already read that one after seeing the announcement on 
> undeadly.org iirc ;)
> 
>> Assuming you mean trunk, not tun, yes.
> 
> Right, thanks for spotting that, I meant trunk of course.
> 
>> I don't see mention of any aggr fixes in the 6.7 changelog, so I guess it 
>> didn't have any disasters in it. Others are using it on production systems.
> 
> Nice to hear that, I will give it a shot as soon as I upgrade to 6.6 my HA 
> CARP cluster of two OpenBSD firewalls. I might first try using it on one of 
> the two firewalls so that I can easily switch to the other firewall in any 
> case of issue.

I would wait for 6.7 before using aggr(4) in production. Considering 6.7 is out 
now, there's no reason not to use it instead of 6.6.

dlg

Re: small aggr problem ( on current )

2019-12-22 Thread David Gwynne

On Thu, Dec 19, 2019 at 01:59:30PM +0100, Hrvoje Popovski wrote:
> On 15.12.2019. 23:01, Hrvoje Popovski wrote:
> > On 15.12.2019. 12:45, Holger Glaess wrote:
> >> hi
> >>
> >>
> >> ?? runing version
> >>
> >>
> >> /etc 16>dmesg | more
> >> Copyright (c) 1982, 1986, 1989, 1991, 1993
> >> ?? The Regents of the University of California.?? All rights 
> >> reserved.
> >> Copyright (c) 1995-2019 OpenBSD. All rights reserved.
> >> https://www.OpenBSD.org
> >>
> >> OpenBSD 6.6-current (GENERIC.MP) #48: Tue Dec 10 16:30:01 MST 2019
> >> dera...@octeon.openbsd.org:/usr/src/sys/arch/octeon/compile/GENERIC.MP
> >>
> >>
> >>
> >> after a reboot the aggr interface do not aggregate the connection with
> >> the switch,
> >>
> >> just after an physical disaconnection from the ethernet cable , wait for
> >> some sec,
> >>
> >> and replugin .
> >>
> >>
> >> the the iterface are up and active, before ifconfig says "no carrier"
> >> but the interfaces have
> >>
> >> carrier.
> >>
> >> i dont have the problem with the trunk interface on the same hardware.
> >>
> >>
> >> you are on bellab as root
> >> /etc 20>cat /etc/hostname.cnmac1
> >> mtu 1518
> >> up
> >>
> >> 12:43:59 Sun Dec 15
> >> you are on bellab as root
> >> /etc 21>cat /etc/hostname.cnmac2
> >> mtu 1518
> >> up
> >>
> >> 12:44:01 Sun Dec 15
> >> you are on bellab as root
> >> /etc 22>cat /etc/hostname.aggr0
> >> trunkport cnmac1
> >> trunkport cnmac2
> >> mtu 1518
> >> up
> >>
> >>
> >> holger
> >>
> >>
> >>
> > Hi,
> > 
> > maybe logs below would help for further troubleshooting because i'm
> > seeing same behavior.
> > 
> > when i add debug statement in hostname.agg0 and boot box i'm getting
> > this log
> > 
> > starting network
> > aggr0 ix0 rxm: LACP_DISABLED (LACP_Enabled) -> PORT_DISABLED
> > aggr0 ix0: selection logic: unselected (rxm !CURRENT)
> > aggr0 ix1 rxm: LACP_DISABLED (LACP_Enabled) -> PORT_DISABLED
> > aggr0 ix1: selection logic: unselected (rxm !CURRENT)
> > aggr0 ix2 rxm: LACP_DISABLED (LACP_Enabled) -> PORT_DISABLED
> > aggr0 ix2: selection logic: unselected (rxm !CURRENT)
> > aggr0 ix3 rxm: LACP_DISABLED (LACP_Enabled) -> PORT_DISABLED
> > aggr0 ix3: selection logic: unselected (rxm !CURRENT)
> > reordering libraries: done.
> > 
> > after boot aggr status is "no carrier"
> > sh /etc/netstart isn't helping
> > 
> > but with ifconfig ix0-ix4 down/up aggr interface start to work normally
> > 
> > log when doing ifconfig ix0-ix4 down/up
> 
> 
> just a little follow up:
> 
> i've tested aggr on two boxes. first box is dell r620 and second one is
> supermicro SYS-5018D-FN8T. both boxes are connected to dell s4810
> switch. Same cables, same ports, same port-channles on switch, timeout
> fast or slow, both with ix 82599 interfaces ... (x552 ix interfaces are
> disabled on supermicro box) ...
> 
> r620 is working without any problems and supermicro box is having same
> problem as described above...
> 
> trunk interface are working on both boxes without any problem ..
> 
> 
> this is fun :)

:/

can you try this diff?

Index: if_aggr.c
===
RCS file: /cvs/src/sys/net/if_aggr.c,v
retrieving revision 1.19
diff -u -p -r1.19 if_aggr.c
--- if_aggr.c   5 Aug 2019 10:42:51 -   1.19
+++ if_aggr.c   23 Dec 2019 04:50:30 -
@@ -2401,8 +2401,7 @@ aggr_up(struct aggr_softc *sc)
 
TAILQ_FOREACH(p, >sc_ports, p_entry) {
aggr_rxm(sc, p, LACP_RXM_E_LACP_ENABLED);
-
-   aggr_selection_logic(sc, p);
+   aggr_p_linkch(p);
}
 
/* start the Periodic Transmission machine */

Re: ipv6 via he.net connectivity issues - possible regression?

2019-12-13 Thread David Gwynne

aggr(4) didn't exist in OpenBSD 6.6, so maybe that's the difference. Does the 
problem go away if you use trunk(4) instead of aggr(4)? Alternatively, could 
you build a -current kernel and make sure you have src/sys/net/if_aggr.c r1.25 
and see what effect that has?

Cheers,
dlg

> On 13 Dec 2019, at 8:06 am, Pedro Caetano  
> wrote:
> 
> Hi misc,
> 
> I'm running amd64 -current, snapshot #518.
> 
> My router has 4 em(4) interfaces.
> em0 provides ipv4 internet via vlan100 which is connected to ISP ont.
> em1, em2, em3 are bonded using aggr(4) to a lacp capable switch.
> 
> A /48 subnet is routed via gif(4) tunnel to he.net, then subnetted into
> /64s.
> 
> Three vlans exist on top of the aggr(4) device.
> Ipv4 addresses are assigned by dhcpd(8), ipv6 addresses are assigned by
> rad(8).
> 
> Hosts can acquire ip via rad(8), but are unable to access the internet
> unless the gateway is pinged.
> Hosts are also unreachable from the internet.
> 
> Unfortunately I cannot tell precisely when this behavior started, but I
> guess this was not an issue on 6.5.
> 
> Please let me know if any more information is needed.
> 
> Best regards,
> Pedro Caetano

Re: issues configuring vlan on top of aggr device

2019-12-05 Thread David Gwynne

On Tue, Dec 03, 2019 at 02:11:16PM +, Pedro Caetano wrote:
> Hi again,
> 
> I'm sorry, but since the boxes do not (yet) have working networking it is
> not easy for me to get the text output.
> I'm attaching a few pictures with the requested output.
> 
> https://picpaste.me/images/2019/12/03/cat_hostname.vl3800_hostname.aggr0.jpg
> https://picpaste.me/images/2019/12/03/ifconfig_vl3800.jpg
> https://picpaste.me/images/2019/12/03/ifconfig_aggr0.jpg
> 
> 
> Best regards,
> Pedro Caetano
> 
> On Tue, Dec 3, 2019 at 12:35 PM Hrvoje Popovski  wrote:
> 
> > On 3.12.2019. 13:15, Pedro Caetano wrote:
> > > Hi Hrvoje, thank you for the fast reply,
> > >
> > > Unfortunately I have the same behavior.
> > > The aggr0 works as expected, as I can see the links bonded on the switch.
> > > I'm able to se the correct vid s, when tcpdump'ing the aggr0 interface.
> > >
> > > I'd appreciate any help on this topic.
> > >
> >
> > can you send ifconfig aggr0 and ifconfig vlan3800 ?
> >
> >
> >
> >
> > > This configuration is working on -current with em(4) nics.
> > >
> > >
> > > Best regards,
> > > Pedro Caetano
> > >
> > > A ter??a, 3/12/2019, 12:01, Hrvoje Popovski  > > > escreveu:
> > >
> > > On 3.12.2019. 12:21, Pedro Caetano wrote:
> > > > Hi misc@
> > > >
> > > > I'm running openbsd 6.6 with latest patches running on a pair of
> > > hp dl 360
> > > > gen6 servers.
> > > >
> > > > I'm attempting to configure an aggr0 device towards a cat 3650.
> > > >
> > > > The aggr0 associates successfully with the switch, but I'm unable
> > > to run
> > > > vlans on top of it.
> > > >
> > > > The configuration on openbsd is the following:
> > > > #ifconfig aggr0 create
> > > > #ifconfig aggr0 trunkport bnx0
> > > > #ifconfig aggr0 trunkport bnx1
> > >
> > > add this - ifconfig aggr0 up
> > > if you have hostname.aggr0 add "up" at the end of that file ...
> > >
> > > > #ifconfig vlan3800 create
> > > > #ifconfig vlan3800 vnetid 3800
> > > > #ifconfig vlan3800 parent aggr0
> > > > #ifconfig vlan3800 10.80.253.10/24 
> > > > ifconfig: SIOCAIFADDR: No buffer space available.

hey,

hrvoje gave me a heads up about this, and i came up with some diffs that
which seem to help according to his testing.

the most useful for you using aggr is this diff for bnx which enables
the use of jumbos. it's pretty mechanical, except that it stops
advertising the VLAN_MTU capability. instead it advertises what the
actual hardmtu is, which allows the extra 4 bytes to be used by any
protocol, not just vlan(4).

aggr(4) does not (currently) pass the VLAN_MTU capability from it's
ports through for vlan(4) to use, but passing the larger hardmtu through
has the same effect.

unless anyone objects, im going to commit this tomorrow.

fyi, ifconfig foo0 hwfeatures is how you see the capabilities and
hardmtu settings.

Index: if_bnx.c
===
RCS file: /cvs/src/sys/dev/pci/if_bnx.c,v
retrieving revision 1.125
diff -u -p -r1.125 if_bnx.c
--- if_bnx.c10 Mar 2018 10:51:46 -  1.125
+++ if_bnx.c5 Dec 2019 09:52:04 -
@@ -875,12 +875,13 @@ bnx_attachhook(struct device *self)
ifp->if_ioctl = bnx_ioctl;
ifp->if_qstart = bnx_start;
ifp->if_watchdog = bnx_watchdog;
+   ifp->if_hardmtu = BNX_MAX_JUMBO_ETHER_MTU_VLAN -
+   sizeof(struct ether_header);
IFQ_SET_MAXLEN(>if_snd, USABLE_TX_BD - 1);
bcopy(sc->eaddr, sc->arpcom.ac_enaddr, ETHER_ADDR_LEN);
bcopy(sc->bnx_dev.dv_xname, ifp->if_xname, IFNAMSIZ);
 
-   ifp->if_capabilities = IFCAP_VLAN_MTU | IFCAP_CSUM_TCPv4 |
-   IFCAP_CSUM_UDPv4;
+   ifp->if_capabilities = IFCAP_CSUM_TCPv4 | IFCAP_CSUM_UDPv4;
 
 #if NVLAN > 0
ifp->if_capabilities |= IFCAP_VLAN_HWTAGGING;
@@ -2417,7 +2418,7 @@ bnx_dma_alloc(struct bnx_softc *sc)
 */
for (i = 0; i < TOTAL_TX_BD; i++) {
if (bus_dmamap_create(sc->bnx_dmatag,
-   MCLBYTES * BNX_MAX_SEGMENTS, BNX_MAX_SEGMENTS,
+   BNX_MAX_JUMBO_ETHER_MTU_VLAN, BNX_MAX_SEGMENTS,
MCLBYTES, 0, BUS_DMA_NOWAIT, >tx_mbuf_map[i])) {
printf(": Could not create Tx mbuf %d DMA map!\n", 1);
rc = ENOMEM;
@@ -2650,8 +2651,8 @@ bnx_dma_alloc(struct bnx_softc *sc)
 * Create DMA maps for the Rx buffer mbufs.
 */
for (i = 0; i < TOTAL_RX_BD; i++) {
-   if (bus_dmamap_create(sc->bnx_dmatag, BNX_MAX_MRU,
-   BNX_MAX_SEGMENTS, BNX_MAX_MRU, 0, BUS_DMA_NOWAIT,
+   if (bus_dmamap_create(sc->bnx_dmatag, BNX_MAX_JUMBO_MRU,
+   1, BNX_MAX_JUMBO_MRU, 0, BUS_DMA_NOWAIT,
>rx_mbuf_map[i])) {
printf(": Could not create Rx mbuf %d DMA map!\n", i);
rc = ENOMEM;
@@

Re: Changes to VLAN and promiscuous mode in 6.6

2019-11-03 Thread David Gwynne

Hey,

This should be fixed in current as of r1.199 of src/sys/net/if_vlan.c

Sorry for the inconvenience.

Cheers,
dlg

> On 29 Oct 2019, at 19:49, Zé Loff  wrote:
> 
> 
> Hi all
> 
> Some changes in VLAN-related code went into 6.6 and I think some of them
> changed the way the parent interface gets into promiscuous mode.  Let me
> try to explain...
> 
> Our ISP provides internet and VoIP over two separate VLANs (100 and 101,
> respectively).  Our external firewall has two physical interfaces re0,
> and re1, and also does the filtering and NATing for internet, but VoIP
> traffic is transparently forwarded to the VoIP phone.  So it's something
> like this:
> 
> GPON -> re0 -+--> vlan100  -> (PF/NAT) -> vlan90   -+-> re1 -> A switch
>  \-> vlan1010 -> bridge1  -> vlan1011 -/
> 
> The VoIP phone connected to the switch, which does all the appropriate
> tagging and untagging.  re0 and re1 have no IP addresses, neither do the
> vlan1010, vlan1011 and bridge1 virtual interfaces.  The VoIP phone gets
> configured by DHCP, and gets its address (and etc) from the ISP.  All
> interfaces are up, and correctly configured (ifconfigs below).  This
> worked fine up until the 6.6 upgrade.
> 
> Now, if things are left alone, the phone fails to get DHCP replies.
> This can be checked by running "tcpdump -i re1 vlan 101", which clearly
> shows the DHCP requests coming from the phone, but getting no replies.
> Exactly the same is seen on vlan1011 and vlan1010 (i.e. on both sides of
> the bridge1): DHCP requests but no replies.  If tcpdump is run on re0
> ("tcpdump -i re0 vlan 101") then the interface goes into promiscuous
> mode and the DHCP replies start flowing from the ISP and the phone
> finally gets configured.  Crucially, if the "-p" flag is added to
> tcpdump (i.e. not putting the if in promiscuous mode), DHCP fails.
> 
> Is this behaviour intended and, if so, can re0 be configured to stay in
> promiscuous mode without having to do something silly as tcpdump'ing
> into /dev/null?
> 
> Thanks in advance
> Zé
> 
> -- 
> 
> # ifconfig -A
> lo0: flags=8049 mtu 32768
>index 5 priority 0 llprio 3
>groups: lo
>inet6 ::1 prefixlen 128
>inet6 fe80::1%lo0 prefixlen 64 scopeid 0x5
>inet 127.0.0.1 netmask 0xff00
> re0: flags=8b43 mtu 
> 1500
>lladdr 00:0d:b9:3c:b0:e8
>index 1 priority 0 llprio 3
>media: Ethernet autoselect (1000baseT full-duplex,master)
>status: active
> re1: flags=8843 mtu 9100
>lladdr 00:0d:b9:3c:b0:e9
>index 2 priority 0 llprio 3
>media: Ethernet autoselect (1000baseT full-duplex,rxpause,txpause)
>status: active
> re2: flags=8802 mtu 1500
>lladdr 00:0d:b9:3c:b0:ea
>index 3 priority 0 llprio 3
>media: Ethernet autoselect (10baseT half-duplex)
>status: no carrier
> enc0: flags=0<>
>index 4 priority 0 llprio 3
>groups: enc
>status: active
> bridge1: flags=41
>index 6 llprio 3
>groups: bridge
>priority 32768 hellotime 2 fwddelay 15 maxage 20 holdcnt 6 proto rstp
>vlan1011 flags=3
>port 11 ifpriority 0 ifcost 0
>vlan1010 flags=3
>port 10 ifpriority 0 ifcost 0
>Addresses (max cache: 100, timeout: 240):
>00:00:5e:00:01:c9 vlan1010 1 flags=0<>
>80:5e:c0:12:3f:80 vlan1011 1 flags=0<>
> vlan100: flags=808843 mtu 
> 1500
>lladdr 00:0d:b9:3c:b0:e8
>description: WAN
>index 9 priority 0 llprio 3
>encap: vnetid 100 parent re0 txprio packet rxprio outer
>groups: vlan egress
>media: Ethernet autoselect (1000baseT full-duplex,master)
>status: active
>inet 148.69.164.57 netmask 0xfc00 broadcast 148.69.167.255
>inet 148.69.143.1 netmask 0xfffc broadcast 148.69.143.3
> vlan1010: flags=8943 mtu 1500
>lladdr 00:0d:b9:3c:b0:e8
>description: VoIP WAN
>index 10 priority 0 llprio 3
>encap: vnetid 101 parent re0 txprio packet rxprio outer
>groups: vlan
>media: Ethernet autoselect (1000baseT full-duplex,master)
>status: active
> vlan1011: flags=8943 mtu 1500
>lladdr 00:0d:b9:3c:b0:e9
>description: VoIP DMZ
>index 11 priority 0 llprio 3
>encap: vnetid 101 parent re1 txprio packet rxprio outer
>groups: vlan
>media: Ethernet autoselect (1000baseT full-duplex,rxpause,txpause)
>status: active
> vlan90: flags=8843 mtu 9000
>lladdr 00:0d:b9:3c:b0:e9
>description: DMZ
>index 14 priority 0 llprio 3
>encap: vnetid 90 parent re1 txprio packet rxprio outer
>groups: vlan
>media: Ethernet autoselect (1000baseT full-duplex,rxpause,txpause)
>status: active
>inet 10.17.16.1 netmask 0xfe00 broadcast 10.17.17.255
> pflog0: flags=141 mtu 33136
>index 15 priority 0 llprio 3
>groups: pflogDear sirs
> 
> 
>

Re: 6.6-beta (RAMDISK_CD) #281 hangs on fsck

2019-09-09 Thread David Gwynne

This should be fixed in -current now. A snapshot should pick it up in a day or 
so. Sorry for the inconvenience.

Cheers,
dlg

> On 9 Sep 2019, at 11:08 am, Luke Small  wrote:
> 
> Yay!
> -Luke
> 
> 
> On Sun, Sep 8, 2019 at 8:07 PM David Gwynne  wrote:
> I think I see the problem. We're going to try and test this locally and will 
> hopefully have something committed in a few hours time.
> 
> dlg
> 
> > On 9 Sep 2019, at 10:33, Luke Small  wrote:
> > 
> > I have mfii too:
> > dmesg | grep mfii:
> > 
> > mfii0 at pci11 dev 0 function 0 "Symbios Logic MegaRAID SAS2208" rev 0x05:
> > msi
> > mfii0: "LSI MegaRAID SAS 9271-8i", firmware 23.28.0-0010, 1024MB cache
> > scsibus1 at mfii0: 64 targets
> > scsibus2 at mfii0: 256 targets
> > 
> >> On 8.9.2019. 18:19, Luke Small wrote:
> >>> It doesn't work for me on the
> >>> ftp.hostserver.de/archive/2019-08-29-0105/amd64/
> >>> bsd.rd!
> >> 
> >> 
> >> Hi,
> >> 
> >> do you maybe have mfii on that box ?
> >> 
> >> I'm having same problem as Mischa and i have mfii. with bsd.rd fsck
> >> stops with this command
> >> 
> >> Which disk is the root disk? ('?' for details) [sd0] sd0
> >> Checking root filesystem (fsck -fp /dev/sd0a)...
> >> 
> >> On other boxes without mfii bsd.rd and sysupgrade works just fine..
> >> 
> >> between 27.08 and 29.8 i saw this commit
> >> 
> >> Changes by:  d...@cvs.openbsd.org 2019/08/27 22:55:51
> >> 
> >> Modified files:
> >>  sys/dev/pci: mfii.c
> >> 
> >> Log message:
> >> implement a DV_POWERDOWN handler to flush cache and shutdown the controller
> >> 
> >> this has been in snaps for the last week without issue, and has
> >> been running in production on a bunch of my boxes for a week before
> >> that, also without issue.
> >> 
> >> 
> >> 
>

Re: 6.6-beta (RAMDISK_CD) #281 hangs on fsck

2019-09-08 Thread David Gwynne

I think I see the problem. We're going to try and test this locally and will 
hopefully have something committed in a few hours time.

dlg

> On 9 Sep 2019, at 10:33, Luke Small  wrote:
> 
> I have mfii too:
> dmesg | grep mfii:
> 
> mfii0 at pci11 dev 0 function 0 "Symbios Logic MegaRAID SAS2208" rev 0x05:
> msi
> mfii0: "LSI MegaRAID SAS 9271-8i", firmware 23.28.0-0010, 1024MB cache
> scsibus1 at mfii0: 64 targets
> scsibus2 at mfii0: 256 targets
> 
>> On 8.9.2019. 18:19, Luke Small wrote:
>>> It doesn't work for me on the
>>> ftp.hostserver.de/archive/2019-08-29-0105/amd64/
>>> bsd.rd!
>> 
>> 
>> Hi,
>> 
>> do you maybe have mfii on that box ?
>> 
>> I'm having same problem as Mischa and i have mfii. with bsd.rd fsck
>> stops with this command
>> 
>> Which disk is the root disk? ('?' for details) [sd0] sd0
>> Checking root filesystem (fsck -fp /dev/sd0a)...
>> 
>> On other boxes without mfii bsd.rd and sysupgrade works just fine..
>> 
>> between 27.08 and 29.8 i saw this commit
>> 
>> Changes by:  d...@cvs.openbsd.org2019/08/27 22:55:51
>> 
>> Modified files:
>>  sys/dev/pci: mfii.c
>> 
>> Log message:
>> implement a DV_POWERDOWN handler to flush cache and shutdown the controller
>> 
>> this has been in snaps for the last week without issue, and has
>> been running in production on a bunch of my boxes for a week before
>> that, also without issue.
>> 
>> 
>>

Re: Controlling OSPFD based on HAProxy state

2019-04-24 Thread David Gwynne

I've used relayd to insert routes to a service based on a health check, and
then had ospfd advertise those routes.  That might be good enough for you.

On Fri., 19 Apr. 2019, 00:40 Henry Bonath,  wrote:

> Does anyone suggest any clever way of controlling OSPFD based on the
> status of an HAProxy process?
>
> I like to use OSPFD to advertise /32 loopback IPs which HAProxy binds
> to for anycasted highly-available Reverse Proxy/Load Balancer
> services.
>
> This works great if the whole box goes down, as OSPF would no longer
> be advertising from that site, but if the HAProxy process fails for
> some reason, then it just goes down as the IP will stay in the OSPF
> table.
>
> I know there are tools like monit or supervisord which may help with
> this, but I wanted to see if anyone here may have any ideas on how to
> achieve this that I may be overlooking.
>
> Thanks!
> -Henry
>
>

Re: Viewing SFP diagnostic data in OpenBSD ?

2019-04-07 Thread David Gwynne

> On 6 Apr 2019, at 01:54, Rachel Roch  wrote:
> 
> 
> 
> 
> Apr 2, 2019, 11:19 PM by da...@gwynne.id.au:
> 
>> 
>> 
>>> On 3 Apr 2019, at 04:52, Stuart Henderson <>> s...@spacehopper.org 
>>> >> > wrote:
>>> 
>>> On 2019-04-02, Rachel Roch <>> rr...@tutanota.de 
>>> >> > wrote:
>>> 
 Hi,

 Hopefully I'm just searching the man pages wrong but I can't seem to find 
 any hints as to how I can view SFP diagnostics in OpenBSD (i.e. light 
 power etc.)

 Perhaps someone could kindly point me in the right direction ?

 Rachel

>>> 
>>> I don't think that code has been written yet.
>>> 
>> 
>> You're right, it hasn't.
>> 
>> Rachel, which nic are you interested in having this on?
>> 
>> dlg
>> 
> 
> Just spotted this email.
> 
> An Intel I350 based NIC made by HotLava  
> (https://hotlavasystems.com/products_gbe.html) 
> 

OK. I made a start on this. Have a look for "sfp module info and diagnostics" 
on tech@, or click on https://marc.info/?l=openbsd-tech=155469738013008=2

We don't have an em(4) here with optics, but a diff doesn't look too bad if 
you're willing to test it.

dlg

Re: Viewing SFP diagnostic data in OpenBSD ?

2019-04-04 Thread David Gwynne

you have em(4) with sfp?

> On 4 Apr 2019, at 18:55, Marco Prause  wrote:
> 
> I second that +1 for ix, but em would also be nice ;-)
> 
> 
> On 03.04.19 00:40, Tom Smyth wrote:
>> +1 for me also :)  ix :)
>> 
>> On Tue, 2 Apr 2019 at 23:38, Stuart Henderson  wrote:
>> 
>>>  :-)
>>> 
>

Re: Trouble forwarding between mpw's in bridge (6.4)

2019-04-02 Thread David Gwynne

Thanks to Mitchell for figuring this out.

> On 3 Apr 2019, at 05:25, Lee Nelson  wrote:
> 
> Since Mitchell's last email, this appeared from CVS in the place where
> the patch was supposed to be applied:
> 
> CLR(m0->m_flags, M_BCAST|M_MCAST);
> 
> I skipped the patch and compiled the kernel with the source as I found
> it from CVS.  With this new kernel everything works as I expected. arp
> broadcast requests coming into the bridge on one mpw are being seen by
> the router on the other mpw and arp replies are getting back to the
> requesting router.
> 
> Thank you to everyone!!!
> 
> On Tue, Apr 2, 2019 at 4:52 AM Mitchell Krome  wrote:
>> 
>> 
>> 
>> On 2/04/2019 7:57 pm, Mitchell Krome wrote:
>>> 
>>> 
>>> On 2/04/2019 7:24 pm, David Gwynne wrote:
>>>> 
>>>> 
>>>>> On 2 Apr 2019, at 6:41 pm, Mitchell Krome  wrote:
>>>>> 
>>>>> On 2/04/2019 2:08 pm, David Gwynne wrote:
>>>>>> Can you send me the hostname.* files and the output of ifconfig (showing 
>>>>>> all interfaces)?
>>>>>> 
>>>>>> You're using -current now, right?
>>>>>> 
>>>>>> dlg
>>>>>> 
>>>>>>> On 2 Apr 2019, at 08:15, lnel...@nelnet.org wrote:
>>>>>>> 
>>>>>>> 
>>>>>>> First of all the protected domain seems to do the opposite of what I
>>>>>>> need, but it may only appear to be the case because of the strageness
>>>>>>> with broadcast.  When trying to ping (or send any traffic) between
>>>>>>> rtr01 and rtr02 and the two mpw2's are in the same protected domain,
>>>>>>> the arp requests die in the bridge.  The arp never shows up at all on
>>>>>>> the other mpw. If I remove the mpw's from the protected domain, then
>>>>>>> the arp traffic gets through to the other mpw, but it doesn't get sent
>>>>>>> out properly by MPLS.  It's sent out as MPLS broadcast traffic
>>>>>>> originating on the physical ethernet interface but with the right label
>>>>>>> for the pseudowire. Even though the arp request itself is broadcast
>>>>>>> traffic, I would expect it to be encapsulated in a unicast MPLS packet
>>>>>>> which is sent from the MAC of the bridge or the originating router and
>>>>>>> and sent as unicast to the destination router with the pseudowire's
>>>>>>> label.  As it is now, even if the destination router could figure out
>>>>>>> what to do with these MPLS broadcast packets, it would respond to the
>>>>>>> physical interface and not the bridge.
>>>>> 
>>>>> You only need the protected domain if you do a full mesh vpls (I.E.
>>>>> every router has a mpw to every other router). That wasn't the config
>>>>> you showed initially so I don't think you need it in your case.
>>>>> 
>>>>> I am running the following diff to get MPLS to work with GRE as I had a
>>>>> similar ARP issue that was caused by gre_input tagging the packets as
>>>>> MCAST and then mpls_input dropping them. When I looked into it I didn't
>>>>> think that should cause the issue I was seeing for a real interface as
>>>>> ether_input didn't re-add the MCAST flag, but I also don't have a real
>>>>> box to test on. You can give it a go and see if it helps.
>>>> 
>>>> I think you've found the problem. mpls_output replaces if_output though, 
>>>> so for interfaces with mpls enabled on this, this change causes 
>>>> BCAST|MCAST to be cleared for all outgoing packets. ie, it might break 
>>>> things like ipv6 nd on ethernet interfaces.
>>> 
>>> Yeah I had no idea what the impact of that change was, it seemed like a
>>> hack when I wrote it...
>>> 
>>>> 
>>>> What are you running on top of GRE that hit this?
>>> 
>>> I have a vpls over GRE. And I had some weird behaviour where arp was
>>> being dropped only on paths that skipped the outer MPLS label. I.E.
>>> we're directly connected to the next-hop and implicit null means we
>>> never add the LSP label, only the service label. Thanks to tcpdump not
>>> knowing about multicast MPLS over GRE and printing weirdness I worked
>>> out what was going on and tracked it down to this.
>&

Re: Viewing SFP diagnostic data in OpenBSD ?

2019-04-02 Thread David Gwynne




> On 3 Apr 2019, at 04:52, Stuart Henderson  wrote:
> 
> On 2019-04-02, Rachel Roch  wrote:
>> Hi,
>> 
>> Hopefully I'm just searching the man pages wrong but I can't seem to find 
>> any hints as to how I can view SFP diagnostics in OpenBSD (i.e. light power 
>> etc.)
>> 
>> Perhaps someone could kindly point me in the right direction ?
>> 
>> Rachel
>> 
>> 
> 
> I don't think that code has been written yet.

You're right, it hasn't.

Rachel, which nic are you interested in having this on?

dlg

Re: Trouble forwarding between mpw's in bridge (6.4)

2019-04-02 Thread David Gwynne




> On 2 Apr 2019, at 6:41 pm, Mitchell Krome  wrote:
> 
> On 2/04/2019 2:08 pm, David Gwynne wrote:
>> Can you send me the hostname.* files and the output of ifconfig (showing all 
>> interfaces)?
>> 
>> You're using -current now, right?
>> 
>> dlg
>> 
>>> On 2 Apr 2019, at 08:15, lnel...@nelnet.org wrote:
>>> 
>>> 
>>> First of all the protected domain seems to do the opposite of what I
>>> need, but it may only appear to be the case because of the strageness
>>> with broadcast.  When trying to ping (or send any traffic) between
>>> rtr01 and rtr02 and the two mpw2's are in the same protected domain,
>>> the arp requests die in the bridge.  The arp never shows up at all on
>>> the other mpw. If I remove the mpw's from the protected domain, then
>>> the arp traffic gets through to the other mpw, but it doesn't get sent
>>> out properly by MPLS.  It's sent out as MPLS broadcast traffic
>>> originating on the physical ethernet interface but with the right label
>>> for the pseudowire. Even though the arp request itself is broadcast
>>> traffic, I would expect it to be encapsulated in a unicast MPLS packet
>>> which is sent from the MAC of the bridge or the originating router and
>>> and sent as unicast to the destination router with the pseudowire's
>>> label.  As it is now, even if the destination router could figure out
>>> what to do with these MPLS broadcast packets, it would respond to the
>>> physical interface and not the bridge.
> 
> You only need the protected domain if you do a full mesh vpls (I.E.
> every router has a mpw to every other router). That wasn't the config
> you showed initially so I don't think you need it in your case.
> 
> I am running the following diff to get MPLS to work with GRE as I had a
> similar ARP issue that was caused by gre_input tagging the packets as
> MCAST and then mpls_input dropping them. When I looked into it I didn't
> think that should cause the issue I was seeing for a real interface as
> ether_input didn't re-add the MCAST flag, but I also don't have a real
> box to test on. You can give it a go and see if it helps.

I think you've found the problem. mpls_output replaces if_output though, so for 
interfaces with mpls enabled on this, this change causes BCAST|MCAST to be 
cleared for all outgoing packets. ie, it might break things like ipv6 nd on 
ethernet interfaces.

What are you running on top of GRE that hit this?

For now it might be better to have mpw etc clear the flags before calling 
mpls_output.

Cheers,
dlg

> 
> 
> diff --git sys/netmpls/mpls_output.c sys/netmpls/mpls_output.c
> index b2be1fcc9..fe6e0ec42 100644
> --- sys/netmpls/mpls_output.c
> +++ sys/netmpls/mpls_output.c
> @@ -53,6 +53,9 @@ mpls_output(struct ifnet *ifp, struct mbuf *m, struct
> sockaddr *dst,
>   int  error;
>   u_int8_t ttl;
> 
> + /* reset broadcast and multicast flags, this is a P2P tunnel */
> + m->m_flags &= ~(M_BCAST | M_MCAST);
> +
>   if (rt == NULL || (dst->sa_family != AF_INET &&
>   dst->sa_family != AF_INET6 && dst->sa_family != AF_MPLS)) {
>   if (!ISSET(ifp->if_xflags, IFXF_MPLS))
> @@ -132,9 +135,6 @@ mpls_output(struct ifnet *ifp, struct mbuf *m,
> struct sockaddr *dst,
>   goto bad;
>   }
> 
> - /* reset broadcast and multicast flags, this is a P2P tunnel */
> - m->m_flags &= ~(M_BCAST | M_MCAST);
> -
>   smpls->smpls_label = shim->shim_label & MPLS_LABEL_MASK;
>   error = ifp->if_ll_output(ifp, m, smplstosa(smpls), rt);
>   return (error);

Re: Trouble forwarding between mpw's in bridge (6.4)

2019-04-01 Thread David Gwynne

Can you send me the hostname.* files and the output of ifconfig (showing all 
interfaces)?

You're using -current now, right?

dlg

> On 2 Apr 2019, at 08:15, lnel...@nelnet.org wrote:
> 
> 
>> Until recently
>> (https://github.com/openbsd/src/commit/dc68b945bbc883db108ac48a07bb89
>> 778b75582a)
>> bridge did split horizon detection by not allowing you to send
>> between
>> two mpw interfaces. In the case of a single VPLS this is the correct
>> thing, but more generally it isn't quite right. Particularly when you
>> want to bridge two seperate VPLS's. It's been removed now, and to
>> achieve proper VPLS functionality with the change applied I found I
>> had
>> to add all mpw interfaces in the same VPLS to the same protected
>> domain.
>> 
>> If you update to current your config will probably work, but be
>> mindful
>> that for a full mesh VPLS if you don't put them in a protected domain
>> you'll probably get a full mesh of broadcasts.
> 
> Thanks.  Your advice on upgrading the OS along with a hack of my own
> got me to a working state, but it isn't a sustainable or stable state.
> I installed the March 31 snapshot and the split-horizon problem was
> resolved.  However, there is still a problem with arp (and probably all
> broadcast traffic, but I never get past arp).  If I create a static arp
> for rtr01 on rtr02 and rtr02 on rtr01, then everything else works. I
> can send traffic back and forth between routers over the pseudowires.
> This is a hack that works for now, but it's not really a solution.
> 
> First of all the protected domain seems to do the opposite of what I
> need, but it may only appear to be the case because of the strageness
> with broadcast.  When trying to ping (or send any traffic) between
> rtr01 and rtr02 and the two mpw2's are in the same protected domain,
> the arp requests die in the bridge.  The arp never shows up at all on
> the other mpw. If I remove the mpw's from the protected domain, then
> the arp traffic gets through to the other mpw, but it doesn't get sent
> out properly by MPLS.  It's sent out as MPLS broadcast traffic
> originating on the physical ethernet interface but with the right label
> for the pseudowire. Even though the arp request itself is broadcast
> traffic, I would expect it to be encapsulated in a unicast MPLS packet
> which is sent from the MAC of the bridge or the originating router and
> and sent as unicast to the destination router with the pseudowire's
> label.  As it is now, even if the destination router could figure out
> what to do with these MPLS broadcast packets, it would respond to the
> physical interface and not the bridge.
> 
> Without the protected domain, this is what I see on both mpw
> interfaces:
>   11   4.015737 02:3b:c0:60:4c:95 ? ff:ff:ff:ff:ff:ff ARP 42 Who has
> 192.168.99.2? Tell 192.168.99.3
>12   4.015751 02:3b:c0:60:4c:95 ? ff:ff:ff:ff:ff:ff ARP 42 Who has
> 192.168.99.2? Tell 192.168.99.3
>13   5.015772 02:3b:c0:60:4c:95 ? ff:ff:ff:ff:ff:ff ARP 42 Who has 
> 
> With the protected domain, I only see these packets on the incoming
> mpw.
> 
> The destination router sees this:
> 189   15.137231   6c:b3:11:4b:07:d4   ff:ff:ff:ff:ff:ff   
> MPLS  60  MPLS Label Switched Packet
> 202   16.161025   6c:b3:11:4b:07:d4   ff:ff:ff:ff:ff:ff   
> MPLS  60  MPLS Label Switched Packet
> 213   17.157232   6c:b3:11:4b:07:d4   ff:ff:ff:ff:ff:ff   
> MPLS  60  MPLS Label Switched Packet
> 
> 02:3b:c0:60:4c:95 is the originating router.
> 6c:b3:11:4b:07:d4 is the physical interface facing the destination
> router
> 
> By examining the MPLS packets I could see they were being sent to the
> right label.  I haven't figured out how to decode the payload, but it's
> 42 bytes which is the exact same length as the inbound arp packets.
> 
> Maybe I'm making wrong assumptions here.  I would expect that either
> the bridge does proxy arp or that the bridge would re-encapsulate
> broadcast packets back into unicast MPLS/VPLS packets on the pseudwire
> which then gets unencapsulated by the destination router and treated as
> broadcast there. Meanwhile, of course, it would also broadcast that
> same arp request out any other interface in the same bridge.
>

Re: dhcrelay multiple instances possible bug

2019-03-04 Thread David Gwynne

Hi Riccardo,

dhrelay only operates on a single interface, so you're not missing anything 
there.

Can you show me the ps output for the dhcrelay processes you start? The rcctl 
commands you show below don't include the rcctl start dhcrelay and 
dhcrelay_second bits.

I have the following in rc.local (mostly because this config predates rcctl):

foo=192.0.2.194
bar=192.0.2.196

echo -n 'start dhcp relays:'
for i in vlan371 vlan373 \
vlan835 \
vlan801 vlan847 vlan866 vlan867 \
vlan811 vlan815 vlan816 \
vlan1101 vlan1147 vlan1165 vlan1166 \
vlan1201 vlan1231 vlan1247 vlan1265 vlan1266 \
vlan1301 vlan1331 vlan1347 vlan1365 vlan1366 \
vlan971 vlan966 \
vlan1401 vlan1465 vlan1466 vlan1467 \
vlan1501 vlan1565 vlan1566 \
vlan1601 vlan1647 vlan1665 vlan1666 vlan1667 \
vlan1701 vlan1747 vlan1765 vlan1766 \
vlan1801 vlan1865 vlan1866 \
vlan1901 vlan1965 vlan1966 \
vlan2001 vlan2065 vlan2066 vlan2067 \
vlan2008 vlan2068 \
vlan2506 vlan2533 vlan2536 vlan2531 vlan2537 vlan2547; do
/usr/sbin/dhcrelay -i ${i} $foo $bar
echo -n " ${i}"
done
echo '.'

Which produces:

xdlg@shotgun1 pf$ ps -aux | grep dhc
_dhcp40965  0.0  0.0   532  1008 ??  Ssp   10Nov17   12:06.67 
/usr/sbin/dhcrelay -i vlan371 192.0.2.194 192.0.2.196
_dhcp16825  0.0  0.0   536  1012 ??  Ssp   10Nov172:08.80 
/usr/sbin/dhcrelay -i vlan867 192.0.2.194 192.0.2.196
_dhcp69672  0.0  0.0   532  1076 ??  Isp   10Nov170:46.06 
/usr/sbin/dhcrelay -i vlan866 192.0.2.194 192.0.2.196
_dhcp48117  0.0  0.0   536   972 ??  Isp   10Nov170:00.02 
/usr/sbin/dhcrelay -i vlan373 192.0.2.194 192.0.2.196
_dhcp43065  0.0  0.0   540  1068 ??  Isp   10Nov170:06.02 
/usr/sbin/dhcrelay -i vlan835 192.0.2.194 192.0.2.196
_dhcp77793  0.0  0.0   540   988 ??  Ssp   10Nov17   19:26.92 
/usr/sbin/dhcrelay -i vlan801 192.0.2.194 192.0.2.196
_dhcp68793  0.0  0.0   540  1028 ??  Isp   10Nov170:08.40 
/usr/sbin/dhcrelay -i vlan847 192.0.2.194 192.0.2.196
_dhcp12879  0.0  0.0   540  1016 ??  Isp   10Nov171:14.46 
/usr/sbin/dhcrelay -i vlan1101 192.0.2.194 192.0.2.196
_dhcp10430  0.0  0.0   544  1052 ??  Ssp   10Nov171:42.55 
/usr/sbin/dhcrelay -i vlan811 192.0.2.194 192.0.2.196
_dhcp87753  0.0  0.0   544  1016 ??  Isp   10Nov170:31.65 
/usr/sbin/dhcrelay -i vlan815 192.0.2.194 192.0.2.196
_dhcp21434  0.0  0.0   536  1024 ??  Isp   10Nov170:00.20 
/usr/sbin/dhcrelay -i vlan816 192.0.2.194 192.0.2.196
_dhcp17816  0.0  0.0   540  1020 ??  Isp   10Nov170:00.00 
/usr/sbin/dhcrelay -i vlan1147 192.0.2.194 192.0.2.196
_dhcp67338  0.0  0.0   540  1020 ??  Isp   10Nov170:00.11 
/usr/sbin/dhcrelay -i vlan1247 192.0.2.194 192.0.2.196
_dhcp73549  0.0  0.0   540  1020 ??  Isp   10Nov170:00.55 
/usr/sbin/dhcrelay -i vlan1165 192.0.2.194 192.0.2.196
_dhcp78748  0.0  0.0   540  1012 ??  Isp   10Nov170:02.33 
/usr/sbin/dhcrelay -i vlan1166 192.0.2.194 192.0.2.196
_dhcp82689  0.0  0.0   540  1008 ??  Isp   10Nov172:02.18 
/usr/sbin/dhcrelay -i vlan1201 192.0.2.194 192.0.2.196
_dhcp31199  0.0  0.0   540   996 ??  Isp   10Nov170:07.63 
/usr/sbin/dhcrelay -i vlan1231 192.0.2.194 192.0.2.196
_dhcp21332  0.0  0.0   532  1004 ??  Isp   10Nov171:24.02 
/usr/sbin/dhcrelay -i vlan1265 192.0.2.194 192.0.2.196
_dhcp35688  0.0  0.0   544  1040 ??  Isp   10Nov170:00.28 
/usr/sbin/dhcrelay -i vlan1347 192.0.2.194 192.0.2.196
_dhcp36741  0.0  0.0   540  1032 ??  Isp   10Nov170:07.17 
/usr/sbin/dhcrelay -i vlan1266 192.0.2.194 192.0.2.196
_dhcp90274  0.0  0.0   544  1024 ??  Isp   10Nov17   19:17.78 
/usr/sbin/dhcrelay -i vlan1301 192.0.2.194 192.0.2.196
_dhcp42199  0.0  0.0   548  1052 ??  Isp   10Nov170:00.17 
/usr/sbin/dhcrelay -i vlan1331 192.0.2.194 192.0.2.196
_dhcp83979  0.0  0.0   528  1000 ??  Ssp   10Nov172:09.78 
/usr/sbin/dhcrelay -i vlan1365 192.0.2.194 192.0.2.196
_dhcp52142  0.0  0.0   536   792 ??  Isp   10Nov170:00.00 
/usr/sbin/dhcrelay -i vlan965 192.0.2.194 192.0.2.196
_dhcp17747  0.0  0.0   540   996 ??  Isp   10Nov170:05.03 
/usr/sbin/dhcrelay -i vlan1366 192.0.2.194 192.0.2.196
_dhcp85673  0.0  0.0   536   988 ??  Isp   10Nov170:11.59 
/usr/sbin/dhcrelay -i vlan947 192.0.2.194 192.0.2.196
_dhcp  266  0.0  0.0   536   964 ??  Isp   10Nov170:01.84 
/usr/sbin/dhcrelay -i vlan966 192.0.2.194 192.0.2.196
_dhcp59857  0.0  0.0   540   984 ??  Isp   10Nov174:26.67 
/usr/sbin/dhcrelay -i vlan1401 192.0.2.194 192.0.2.196
_dhcp17159  0.0  0.0   536  1012 ??  Ssp   10Nov171:27.85 
/usr/sbin/dhcrelay -i vlan971 192.0.2.194 192.0.2.196
_dhcp67613  0.0  0.0   540  1028 ??  Isp   10Nov172:29.27 
/usr/sbin/dhcrelay -i vlan1465 192.0.2.194 192.0.2.196
_dhcp33040  0.0  0.0   536   840 ??  Isp   10Nov170:00.00 
/usr/sbin/dhcrelay -i vlan1565 192.0.2.194 192.0.2.196
_dhcp 4850  0.0  0.0   544   844 ??  Isp

Re: Packet loss with latest snapshot

2019-03-04 Thread David Gwynne

On Mon, Mar 04, 2019 at 10:36:23AM +0100, Tony Sarendal wrote:
> On Mon, 4 Mar 2019, 09:43 Tony Sarendal,  wrote:
> 
> >
> >
> > Den m??n 4 mars 2019 kl 09:26 skrev Tony Sarendal :
> >
> >> Den s??n 3 mars 2019 kl 21:35 skrev Theo de Raadt :
> >>
> >>> Tony,
> >>>
> >>> Are you out of your mind?  You didn't provide even a rough hint about
> >>> what your firewall configuration looks like.  You recognize that's
> >>> pathetic, right?
> >>>
> >>> > Earlier in the week I could run parallel ping-pong tests through my
> >>> test
> >>> > firewalls
> >>> > at 300kpps without any packet loss. I updated to the latest snapshot
> >>> today
> >>> > and
> >>> > start to see packet loss at around 80kpps.
> >>> >
> >>> > /T
> >>> >
> >>> > OpenBSD 6.5-beta (GENERIC.MP) #764: Sun Mar  3 10:24:08 MST 2019
> >>> > dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/
> >>> GENERIC.MP
> >>> > real mem = 34300891136 (32711MB)
> >>> > avail mem = 33251393536 (31711MB)
> >>> > mpath0 at root
> >>> > scsibus0 at mpath0: 256 targets
> >>> > mainbus0 at root
> >>> > bios0 at mainbus0: SMBIOS rev. 2.7 @ 0xec170 (34 entries)
> >>> > bios0: vendor American Megatrends Inc. version "3.0" date 04/24/2015
> >>> > bios0: Supermicro X10SLD
> >>> > acpi0 at bios0: rev 2
> >>> > acpi0: sleep states S0 S4 S5
> >>> > acpi0: tables DSDT FACP APIC FPDT FIDT SSDT SSDT MCFG PRAD HPET SSDT
> >>> SSDT
> >>> > SPMI DMAR EINJ ERST HEST BERT
> >>> > acpi0: wakeup devices PEGP(S4) PEG0(S4) PEGP(S4) PEG1(S4) PEGP(S4)
> >>> PEG2(S4)
> >>> > PXSX(S4) RP01(S4) PXSX(S4) RP02(S4) PXSX(S4) RP03(S4) PXSX(S4) RP04(S4)
> >>> > PXSX(S4) RP05(S4) [...]
> >>> > acpitimer0 at acpi0: 3579545 Hz, 24 bits
> >>> > acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
> >>> > cpu0 at mainbus0: apid 0 (boot processor)
> >>> > cpu0: Intel(R) Xeon(R) CPU E3-1241 v3 @ 3.50GHz, 3500.68 MHz, 06-3c-03
> >>> > cpu0:
> >>> >
> >>> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,PERF,ITSC,FSGSBASE,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,MELTDOWN
> >>> > cpu0: 256KB 64b/line 8-way L2 cache
> >>> > cpu0: smt 0, core 0, package 0
> >>> > mtrr: Pentium Pro MTRR support, 10 var ranges, 88 fixed ranges
> >>> > cpu0: apic clock running at 99MHz
> >>> > cpu0: mwait min=64, max=64, C-substates=0.2.1.2.4, IBE
> >>> > cpu1 at mainbus0: apid 2 (application processor)
> >>> > cpu1: Intel(R) Xeon(R) CPU E3-1241 v3 @ 3.50GHz, 3500.01 MHz, 06-3c-03
> >>> > cpu1:
> >>> >
> >>> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,PERF,ITSC,FSGSBASE,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,MELTDOWN
> >>> > cpu1: 256KB 64b/line 8-way L2 cache
> >>> > cpu1: smt 0, core 1, package 0
> >>> > cpu2 at mainbus0: apid 4 (application processor)
> >>> > cpu2: Intel(R) Xeon(R) CPU E3-1241 v3 @ 3.50GHz, 3500.01 MHz, 06-3c-03
> >>> > cpu2:
> >>> >
> >>> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,PERF,ITSC,FSGSBASE,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,MELTDOWN
> >>> > cpu2: 256KB 64b/line 8-way L2 cache
> >>> > cpu2: smt 0, core 2, package 0
> >>> > cpu3 at mainbus0: apid 6 (application processor)
> >>> > cpu3: Intel(R) Xeon(R) CPU E3-1241 v3 @ 3.50GHz, 3500.01 MHz, 06-3c-03
> >>> > cpu3:
> >>> >
> >>> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,PERF,ITSC,FSGSBASE,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,MELTDOWN
> >>> > cpu3: 256KB 64b/line 8-way L2 cache
> >>> > cpu3: smt 0, core 3, package 0
> >>> > ioapic0 at mainbus0: apid 8 pa 0xfec0, version 20, 24 pins
> >>> > acpimcfg0 at acpi0
> >>> > acpimcfg0: addr 0xf800, bus 0-63
> >>> > acpihpet0 at acpi0: 14318179 Hz
> >>> > acpiprt0 at acpi0: bus 0 (PCI0)
> >>> > acpiprt1 at acpi0: bus 1 (PEG0)
> >>> > acpiprt2 at acpi0: bus 2 (PEG1)
> >>> > acpiprt3 at acpi0: bus -1 (PEG2)
> >>> > acpiprt4 at acpi0: bus 3 (RP01)
> >>> > acpiprt5 at acpi0: bus -1 (RP02)
> >>> > acpiprt6 at acpi0:

Re: PPPoE vlan issue 6.4

2019-02-10 Thread David Gwynne

Hi Adam,

It sounds like you're on an ISP with very similar requirements to me. The exec 
summary of what my ISP wants is pppoe on vlan2, with the vlan priority forced 
to a single value.

Our (OpenBSD's) understanding of the priority field in VLAN headers is that it 
uses 802.1p for the fields value. 802.1p says that priories 0 and 1 are swapped 
on the wire, and we use that consistently in the system, ie, the priority you 
see in tcpdump on a vlan interface is the same as what you configure for the 
priority value there, and visa versa. Everyone else seems to think 0 is 0 and 1 
is 1, which can be confusing.

My ISP wants priority 0 on the wire, which means 1 in OpenBSD.

I'm using an APU1, so I have re interfaces instead of em. I have re0 going to 
the ISP, and re1 is my internal network.

hostname.re0:
up

hostname.vlan2:
vnetid 2
parent re0
link0 llprio 1
up

hostname.pppoe0:
== pppoe0 ==
inet 0.0.0.0 255.255.255.255 0.0.0.1
pppoedev vlan2
authproto pap
authname 'dlg@the_isp' authkey 'secret'
group external
!/sbin/route add default -ifp pppoe0 0.0.0.1
up

hostname.re1:
inet 192.168.1.1/24


In OpenBSD 6.5 the syntax for priority on vlan frames is different. Instead of 
"link0" and "llprio 1" you just set "txprio 1".

While figuring this stuff out I used the APU as a bridge between the ISP 
supplied router and the modem.

Hope this helps.

dlg


> On 10 Feb 2019, at 15:51, Adam Evans  wrote:
> 
> Some more debugging, a lot further but still no success.
> 
> I attached the DD-WRT modem directly to a computer to capture the PADI 
> packets.
> 
> Capturing from the DD-WRT modem directly, PADI packets look like the below:
> 
> 22:15:54.329145 a0:63:91:47:81:07 (oui Unknown) > Broadcast, ethertype 802.1Q 
> (0x8100), length 36: vlan 2, p 0, ethertype PPPoE D, PPPoE PADI 
> [Service-Name] [Host-Uniq 0xEE72]
>0x:  0002 8863 1109  000c 0101  0103  ...c
>0x0010:  0004 ee72    ...r..
> 
> 
> On the other end of the wire at the client the packets look like:
> 12:13:05.995412 a0:63:91:47:81:07 (oui Unknown) > Broadcast, ethertype PPPoE 
> D (0x8863), length 60: PPPoE PADI [Service-Name] [Host-Uniq 0x622A]
>   0x:  1109  000c 0101  0103 0004 622a  ..b*
>   0x0010:           
>   0x0020:       838c 7a4d   zM
> 
> 12:13:20.277749 a0:63:91:47:81:07 (oui Unknown) > Broadcast, ethertype PPPoE 
> D (0x8863), length 60: PPPoE PADI [Service-Name] [Host-Uniq 0xF02A]
>   0x:  1109  000c 0101  0103 0004 f02a  ...*
>   0x0010:           
>   0x0020:       e929 b08f   ...)..
> 
> From the above it looks like the PPPoE Discovery is not done over the vlan as 
> it get's stripped.
> 
> I updated the /etc/hostname.pppoe0 config to change pppodev from vlan2 to 
> em0. I then plugged the device in to the bridged modem and brought up the 
> PPPoE interface which returned the below. I do not have IPv6 setup in my 
> PPPoE config so it looks like the remote tries to send me a IPv6 packet which 
> causes OpenBSD to send a terminate session response.
> 
> # ifconfig pppoe0 up
> Feb 10 13:18:48 foo /bsd: pppoe0: lcp close(initial)
> Feb 10 13:18:48 foo /bsd: pppoe0: lcp open(initial)
> Feb 10 13:18:48 foo /bsd: pppoe0: lcp initial->starting
> Feb 10 13:18:48 foo /bsd: pppoe0: phase establish
> Feb 10 13:18:48 foo /bsd: pppoe0 (8863) state=1, session=0x0 output -> 
> ff:ff:ff:ff:ff:ff, len=18
> Feb 10 13:18:48 foo /bsd: pppoe0 (8863) state=2, session=0x0 output -> 
> 78:da:6e:de:db:d4, len=38
> Feb 10 13:18:48 foo /bsd: pppoe0: received unexpected PADO
> Feb 10 13:18:48 foo last message repeated 10 times
> Feb 10 13:18:48 foo /bsd: pppoe0: session 0xe84d connected
> Feb 10 13:18:48 foo /bsd: pppoe0: lcp up(starting)
> Feb 10 13:18:48 foo /bsd: pppoe0: lcp starting->req-sent
> Feb 10 13:18:48 foo /bsd: pppoe0: lcp output  05-06-0f-4a-92-53-01-04-05-d4>
> Feb 10 13:18:48 foo /bsd: pppoe0 (8864) state=3, session=0xe84d output -> 
> 78:da:6e:de:db:d4, len=22
> Feb 10 13:18:48 foo /bsd: pppoe0: lcp input(req-sent):  len=18 
> 01-04-05-d4-03-04-c0-23-05-06-b1-df-b5-ab-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00>
> Feb 10 13:18:48 foo /bsd: pppoe0: lcp parse opts: mru auth-proto magic 
> Feb 10 13:18:48 foo /bsd: pppoe0: lcp parse opt values: mru 1492 auth-proto 
> magic 0xb1dfb5ab send conf-ack
> Feb 10 13:18:48 foo /bsd: pppoe0: lcp output  01-04-05-d4-03-04-c0-23-05-06-b1-df-b5-ab>
> Feb 10 13:18:48 foo /bsd: pppoe0 (8864) state=3, session=0xe84d output -> 
> 78:da:6e:de:db:d4, len=26
> Feb 10 13:18:48 foo /bsd: pppoe0: lcp req-sent->ack-sent
> Feb 10 13:18:48 foo /bsd: pppoe0: lcp input(ack-sent):  len=14 
> 05-06-0f-4a-92-53-01-04-05-d4-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00>
> Feb

Re: SNMP reporting on VXLAN interfaces

2018-08-16 Thread David Gwynne

On Thu, Aug 16, 2018 at 10:51:25AM +1000, Jason Tubnor wrote:
> Hi,
> 
> Not sure if anyone else here is using SNMP for obtaining VXLAN(4) adapter
> throughput but after some testing (clamping with PF queues), I have
> discovered that throughput on VXLAN interfaces via SNMP are reporting
> exactly double the data throughput than what is measured either through
> iperf or pfctl -vvsq .  Regular interfaces on the machine below (vmx) are
> reporting correctly.
> 
> Am I missing something here or could it be a potential bug in the VXLAN
> code in how it reports into snmpd?

The vxlan driver counts something that the network stack does for it
now. The diff below fixes the problem if you want to try it, but I will
be committing it soon.

Cheers,
dlg

Index: if_vxlan.c
===
RCS file: /cvs/src/sys/net/if_vxlan.c,v
retrieving revision 1.67
diff -u -p -r1.67 if_vxlan.c
--- if_vxlan.c  20 Feb 2018 01:20:37 -  1.67
+++ if_vxlan.c  17 Aug 2018 01:36:55 -
@@ -929,9 +929,6 @@ vxlan_output(struct ifnet *ifp, struct m
bridge_tunneluntag(m);
 #endif
 
-   ifp->if_opackets++;
-   ifp->if_obytes += m->m_pkthdr.len;
-
m->m_pkthdr.ph_rtableid = sc->sc_rdomain;
 
 #if NPF > 0

Re: OSPF over gif on top of IPsec transport -current

2018-03-13 Thread David Gwynne

> On 10 Mar 2018, at 08:01, Remi Locherer  wrote:
> 
> 
> With below diff the setup works as expected: tcpdump shows OSPF hellos
> on gif0 and ospfd sees the neighbour.
> 
> I don't think it's the correct fix though.

functionally it is the correct fix.

when i reworked gif(4) in src/sys/net/if_gif.c r1.108, i merged the ipv4 and 
ipv6 input paths. the ipv6 input code had this check, but ipv4 did not. now it 
is applied to ipv4, but it is obviously wrong for both address families.

please commit the removal of this check, ok by me.

thank you to everyone for the but report and debugging. i'm sorry for taking so 
long to figure this out. 

dlg 

> 
> 
> Index: if_gif.c
> ===
> RCS file: /cvs/src/sys/net/if_gif.c,v
> retrieving revision 1.112
> diff -u -p -r1.112 if_gif.c
> --- if_gif.c  28 Feb 2018 23:28:05 -  1.112
> +++ if_gif.c  9 Mar 2018 20:52:46 -
> @@ -745,8 +745,8 @@ gif_input(struct gif_tunnel *key, struct
>   }
>   
>   /* XXX What if we run transport-mode IPsec to protect gif tunnel ? */
> - if (m->m_flags & (M_AUTH | M_CONF))
> - return (-1);
> + //if (m->m_flags & (M_AUTH | M_CONF))
> + //  return (-1);
> 
>   key->t_rtableid = m->m_pkthdr.ph_rtableid;

Re: OSPF over gif on top of IPsec transport -current

2018-03-13 Thread David Gwynne

> On 11 Mar 2018, at 05:30, Atanas Vladimirov  wrote:
> 
> On 2018-03-10 00:01, Remi Locherer wrote:
>>> 
>> With below diff the setup works as expected: tcpdump shows OSPF hellos
>> on gif0 and ospfd sees the neighbour.
>> I don't think it's the correct fix though.
>> Index: if_gif.c
>> ===
>> RCS file: /cvs/src/sys/net/if_gif.c,v
>> retrieving revision 1.112
>> diff -u -p -r1.112 if_gif.c
>> --- if_gif.c 28 Feb 2018 23:28:05 -  1.112
>> +++ if_gif.c 9 Mar 2018 20:52:46 -
>> @@ -745,8 +745,8 @@ gif_input(struct gif_tunnel *key, struct
>>  }
>>  /* XXX What if we run transport-mode IPsec to protect gif tunnel ? */
>> -if (m->m_flags & (M_AUTH | M_CONF))
>> -return (-1);
>> +//if (m->m_flags & (M_AUTH | M_CONF))
>> +//  return (-1);
>>  key->t_rtableid = m->m_pkthdr.ph_rtableid;
> 
> Hi Remi,
> 
> Thanks for confirming that there is an issue and I'm not doing something 
> wrong on my side.
> I'll try the diff as soon as possible.

it isnt clear to me how ipsec and gif(4) are supposed to interact. on the one 
hand you have the gif(4) manpage saying this:

BUGS
 There are many tunnelling protocol specifications, defined differently
 from each other.  gif may not interoperate with peers which are based on
 different specifications, and are picky about outer header fields.  For
 example, you cannot usually use gif to talk with IPsec devices that use
 IPsec tunnel mode.

so it's saying that ipsec tunnel mode and gif don't work, but then you have the 
code that remi is disabling saying that gif and ipsec transport dont work.

i can understand the issue since a decrypted esp packet looks a lot like the 
packets gif wants to handle. if we change to code or doco to make something 
work, which way should we go?

right now i would use gre inside ipsec transport mode, not gif. it has the 
benefit of working, and it is harder for traffic inside the tunnel to leak out 
of ipsec. more specifically, gif handles 3 ip protocols, ipv4, ipv6, and mpls, 
which are ip protocol numbers 4, 41, and 137 respectively. it is likely that 
people could set up ipsec to protect ipv4, but forget about ipv6 and mpls. if 
you then configure v6 or mpls on the gif interface, that traffic will leak.

gre on the other hand is a single ip protocol, so more straightforward to 
protect. there's also a very clear line in the sand between the inner and outer 
traffic, which esp tunnel and transport mode lack.

dlg

Re: gif(4) changes vs tunnelbroker

2018-02-28 Thread David Gwynne


> On 1 Mar 2018, at 02:22, Andreas Bartelt <o...@bartula.de> wrote:
> 
> On 02/27/18 22:35, Pavel Korovin wrote:
>> On 02/28, David Gwynne wrote:
>>> what is the status of sysctl net.inet.ipip ?
>> David, thank you! That was easy :)
>> Sorry for the noise.
>> $ sysctl net.inet.ipip.allow
>> net.inet.ipip.allow=0
>> # sysctl -w net.inet.ipip.allow=1
>> net.inet.ipip.allow: 0 -> 1
>> $ ping6 www.google.com
>> PING www.google.com (2a00:1450:4013:c01::67): 56 data bytes
>> 64 bytes from 2a00:1450:4013:c01::67: icmp_seq=0 hlim=48 time=40.500 ms
>> 64 bytes from 2a00:1450:4013:c01::67: icmp_seq=1 hlim=48 time=40.645 ms
>> ^C
> 
> I'm also observing a breakage of a previously working IPv6 tunnelbroker 
> config on current (problem introduced since at least Feb, 23rd).
> 
> The combination of two things made it work again (or at least works around 
> the underlying problem):
> 1) sysctl net.inet.ipip.allow=1 [not yet documented at 
> www.openbsd.org/faq/current.html]
> 2) removing ``set state-policy if-bound'' from my pf.conf [which always 
> worked before with the same tunnelbroker setup]
> 
> According to pflog(4), a ping6 to some destination now looks buggy to me:
> - outgoing icmp6 echo request is only visible on gif(4)
> - incoming icmp6 echo reply is only visible on the underlying physical 
> interface of gif(4)
> which blocks the ping6 in the case of ``set state-policy if-bound''.

i found what i think is the problem.

it turns out the net.inet.ipip.allow sysctl was a red herring. it controls the 
processing of ipip by the network stack, it is not related to whether gif 
should accept packets. the problem was i got the mapping of ip addresses in 
incoming packets to the addresses on the tunnels wrong.

this should be fixed in src/sys/net/if_gif.c r1.112.

sorry for the inconvenience.

dlg

Re: gif(4) changes vs tunnelbroker

2018-02-27 Thread David Gwynne

> On 27 Feb 2018, at 4:10 am, Pavel Korovin  wrote:
> 
> Dear all,
> 
> After upgrading several hosts to -current I noticed that all my IPv6 tunnels
> via tunnelbroker stopped working. Recently introduced changes to gif(4) 
> (since 
> late December 2017) are too complex for me to grasp, maybe anybody on the list
> can advise.

hi pavel,

there was a window where gif only allowed configuration of the tunnel 
parameters while the interface was down, but still implicitly brought the 
interface up when addresses were configured. a lot of gif configs (or tunnel 
configs generally) have the ips set before the tunnel, so they'd go up, and 
then prevent configuration.

this has been fixed in -current, but a snap with the fix may not have made it 
out.

if this isn't the problem, can you send me your config and the state of the gif 
interfaces that are at fault and i'll see what else i broke.

cheers,
dlg

> 
> -- 
> With best regards,
> Pavel Korovin
>

Re: re0 and re1 watchdog timeouts, and system freeze

2017-06-11 Thread David Gwynne

On Fri, Jun 09, 2017 at 07:19:34PM +0200, Bj??rn Ketelaars wrote:
> On Fri 09/06/2017 12:07, Martin Pieuchot wrote:
> > On 08/06/17(Thu) 20:38, Bj??rn Ketelaars wrote:
> > > On Thu 08/06/2017 16:55, Martin Pieuchot wrote:
> > > > On 07/06/17(Wed) 09:43, Bj??rn Ketelaars wrote:
> > > > > On Sat 03/06/2017 08:44, Bj??rn Ketelaars wrote:
> > > > > > 
> > > > > > Reverting back to the previous kernel fixed the issue above. 
> > > > > > Question: can
> > > > > > someone give a hint on how to track this issue?
> > > > > 
> > > > > After a bit of experimenting I'm able to reproduce the problem. 
> > > > > Summary is
> > > > > that queueing in pf and use of a current (after May 30), multi 
> > > > > processor
> > > > > kernel (bsd.mp from snapshots) causes these specific watchdog timeouts
> > > > > followed by a system freeze.
> > > > > 
> > > > > Issue is 'gone' when:
> > > > > 1.) using an older kernel (before May 30);
> > > > > 2.) removal of queueing statements from pf.conf. Included below the 
> > > > > specific
> > > > > snippet;
> > > > > 3.) switch from MP kernel to SP kernel.
> > > > > 
> > > > > New observation is that while queueing, using a MP kernel, the 
> > > > > download
> > > > > bandwidth is only a fraction of what is expected. Exchanging the MP 
> > > > > kernel
> > > > > with a SP kernel restores the download bandwidth to expected level.
> > > > > 
> > > > > I'm guessing that this issue is related to recent work on PF?
> > > > 
> > > > It's certainly a problem in, or exposed by, re(4) with the recent MP 
> > > > work
> > > > in the network stack.
> > > > 
> > > > It would help if you could build a kernel with MP_LOCKDEBUG defined and
> > > > see if the resulting kernel enters ddb(4) instead of freezing.
> > > > 
> > > > Thanks,
> > > > Martin
> > > 
> > > Thanks for the hint! It helped in entering ddb. I collected a bit of 
> > > output,
> > > which you can find below. If I read the trace correctly the crash is 
> > > related
> > > to line 1750 of sys/dev/ic/re.c:
> > > 
> > >   d->rl_cmdstat |= htole32(RL_TDESC_CMD_EOF);
> > 
> > Could you test the diff below, always with a MP_LOCKDEBUG kernel and
> > tell us if you can reproduce the freeze or if the kernel enters ddb(4)?
> > 
> > Another question, how often do you see "watchdog timeout" messages?
> > 
> > Index: re.c
> > ===
> > RCS file: /cvs/src/sys/dev/ic/re.c,v
> > retrieving revision 1.201
> > diff -u -p -r1.201 re.c
> > --- re.c24 Jan 2017 03:57:34 -  1.201
> > +++ re.c9 Jun 2017 10:04:43 -
> > @@ -2074,9 +2074,6 @@ re_watchdog(struct ifnet *ifp)
> > s = splnet();
> > printf("%s: watchdog timeout\n", sc->sc_dev.dv_xname);
> >  
> > -   re_txeof(sc);
> > -   re_rxeof(sc);
> > -
> > re_init(ifp);
> >  
> > splx(s);
> 
> The diff (with a MP_LOCKDEBUG kernel) resulted in similar traces as before.
> ddb Output is included below.
> 
> With your diff the number of timeout messages decreased from 9 to 2 before
> entering ddb.

can you try the diff below please?

Index: hfsc.c
===
RCS file: /cvs/src/sys/net/hfsc.c,v
retrieving revision 1.39
diff -u -p -r1.39 hfsc.c
--- hfsc.c  8 May 2017 11:30:53 -   1.39
+++ hfsc.c  12 Jun 2017 05:08:01 -
@@ -817,7 +817,7 @@ hfsc_deferred(void *arg)
KASSERT(HFSC_ENABLED(ifq));
 
if (!ifq_empty(ifq))
-   (*ifp->if_qstart)(ifq);
+   ifq_start(ifq);
 
hif = ifq->ifq_q;

Re: SCSI Enclosure Service

2017-06-08 Thread David Gwynne

hey jens,

from what i can tell, you talk to the ami mg9071 chips on that enclosure using 
sgpio, not in band using smp (sas mgmt protocol) or ses as a scsi device.

i get the impression that mpii hardware does have some understanding of 
enclosures connected via sgpio, but i'm not sure what benefit it would provide. 
it may affect addressing on the bus, but im not sure you'd get temperatures or 
fan speeds or anything off it.

cheers,
dlg

> On 9 Jun 2017, at 02:05, Jens A. Griepentrog  
> wrote:
> 
> Dear Listeners,
> 
> Let me know, please, if enclosure monitoring
> is supported for disks attached to Supermicro
> M28SAB drive cages (with two AMI MG9071 chips)
> or similar backplanes. Drives work fine when
> attached to some LSI 2008 controller but there
> appear no "ses* at scsibus?" boot messages
> (see below, disks attached to the drive cage
> are sd4 ... sd11), jumper settings on the cage:
> JP61 2-3: Fan disabled (there is no fan)
> JP62 1-2: Enclosure monitor enabled
> 
> With best regards,
> Jens
> 
> 
> 
> OpenBSD 6.1 (GENERIC.MP) #6: Mon May 22 20:34:30 CEST 2017
> rob...@syspatch-61-amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> real mem = 17154113536 (16359MB)
> avail mem = 16629547008 (15859MB)
> mpath0 at root
> scsibus0 at mpath0: 256 targets
> mainbus0 at root
> bios0 at mainbus0: SMBIOS rev. 2.6 @ 0xf06f0 (62 entries)
> bios0: vendor American Megatrends Inc. version "0705" date 06/29/2010
> bios0: ASUSTeK Computer INC. P7F-M WS
> acpi0 at bios0: rev 2
> acpi0: sleep states S0 S1 S3 S4 S5
> acpi0: tables DSDT FACP APIC MCFG OEMB HPET SSDT
> acpi0: wakeup devices BR1E(S4) UAR1(S4) PS2K(S4) EUSB(S4) USB0(S4) USB1(S4) 
> USB2(S4) USB3(S4) USBE(S4) USB4(S4) USB5(S4) USB6(S4) BR21(S4) BR22(S4) 
> BR23(S4) P0P1(S4) [...]
> acpitimer0 at acpi0: 3579545 Hz, 24 bits
> acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
> cpu0 at mainbus0: apid 0 (boot processor)
> cpu0: Intel(R) Xeon(R) CPU L3426 @ 1.87GHz, 1867.00 MHz
> cpu0: 
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,SSE4.1,SSE4.2,POPCNT,NXE,RDTSCP,LONG,LAHF,PERF,ITSC,SENSOR
> cpu0: 256KB 64b/line 8-way L2 cache
> cpu0: TSC frequency 1867000680 Hz
> cpu0: smt 0, core 0, package 0
> mtrr: Pentium Pro MTRR support, 8 var ranges, 88 fixed ranges
> cpu0: apic clock running at 133MHz
> cpu0: mwait min=64, max=64, C-substates=0.2.1.1, IBE
> cpu1 at mainbus0: apid 2 (application processor)
> cpu1: Intel(R) Xeon(R) CPU L3426 @ 1.87GHz, 1866.73 MHz
> cpu1: 
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,SSE4.1,SSE4.2,POPCNT,NXE,RDTSCP,LONG,LAHF,PERF,ITSC,SENSOR
> cpu1: 256KB 64b/line 8-way L2 cache
> cpu1: smt 0, core 1, package 0
> cpu2 at mainbus0: apid 4 (application processor)
> cpu2: Intel(R) Xeon(R) CPU L3426 @ 1.87GHz, 1866.73 MHz
> cpu2: 
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,SSE4.1,SSE4.2,POPCNT,NXE,RDTSCP,LONG,LAHF,PERF,ITSC,SENSOR
> cpu2: 256KB 64b/line 8-way L2 cache
> cpu2: smt 0, core 2, package 0
> cpu3 at mainbus0: apid 6 (application processor)
> cpu3: Intel(R) Xeon(R) CPU L3426 @ 1.87GHz, 1866.73 MHz
> cpu3: 
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,SSE4.1,SSE4.2,POPCNT,NXE,RDTSCP,LONG,LAHF,PERF,ITSC,SENSOR
> cpu3: 256KB 64b/line 8-way L2 cache
> cpu3: smt 0, core 3, package 0
> ioapic0 at mainbus0: apid 7 pa 0xfec0, version 20, 24 pins
> acpimcfg0 at acpi0 addr 0xe000, bus 0-255
> acpihpet0 at acpi0: 14318179 Hz
> acpiprt0 at acpi0: bus 0 (PCI0)
> acpiprt1 at acpi0: bus 7 (BR1E)
> acpiprt2 at acpi0: bus -1 (BR21)
> acpiprt3 at acpi0: bus -1 (BR22)
> acpiprt4 at acpi0: bus -1 (BR23)
> acpiprt5 at acpi0: bus -1 (P0P1)
> acpiprt6 at acpi0: bus 1 (P0P3)
> acpiprt7 at acpi0: bus -1 (P0P4)
> acpiprt8 at acpi0: bus -1 (P0P5)
> acpiprt9 at acpi0: bus -1 (P0P6)
> acpiprt10 at acpi0: bus 2 (BR20)
> acpiprt11 at acpi0: bus 5 (BR26)
> acpiprt12 at acpi0: bus 6 (BR27)
> acpicpu0 at acpi0: !C3(350@17 mwait.1@0x20), !C3(500@17 mwait.1@0x10), 
> C1(1000@1 mwait.1), PSS
> acpicpu1 at acpi0: !C3(350@17 mwait.1@0x20), !C3(500@17 mwait.1@0x10), 
> C1(1000@1 mwait.1), PSS
> acpicpu2 at acpi0: !C3(350@17 mwait.1@0x20), !C3(500@17 mwait.1@0x10), 
> C1(1000@1 mwait.1), PSS
> acpicpu3 at acpi0: !C3(350@17 mwait.1@0x20), !C3(500@17 mwait.1@0x10), 
> C1(1000@1 mwait.1), PSS
> "PNP0501" at acpi0 not configured
> "PNP0303" at acpi0 not configured
> acpibtn0 at acpi0: PWRB
> ipmi at mainbus0 not configured
> cpu0: Enhanced SpeedStep 1867 MHz: speeds: 1868, 1867,

Re: Does CARP need Layer 2 ?

2017-04-17 Thread David Gwynne

> On 18 Apr 2017, at 03:54, Bob Jones 
>  wrote:
> 
> Hi,
> 
> Looking at the docs, unlike pfsync, sasyncd and everything else, you
> seem to be unable to define a "different" interface to CARP for the
> purposes of monitoring.  Everything seems to need to go over the one
> carpdev.
> 
> My question arises is because I have a couple of OpenBSD units due to
> be plugged into upstream router ports (direct patch, not via
> intermediate switch).
> 
> Obviously for most things, OSPF and BGP will take care of redundancy.
> But for the purposes of VPN failover, I would like to use CARP on my
> "external" interfaces, but as far as my interpretation of the docs go,
> CARP protocol won't work over Layer 3 ?

that's correct.

> Could someone provide further insight into whether my interpretation
> is correct, and whether I have any other options available ?  I don't
> really want to go adding a layer 2 switch on my side because that just
> introduces extra point of failure.

off the top of my head, you have two paths you could take.

firstly, you could advertise the vpn service as the same ip addresses bound to 
loopback (lo(4)) interfaces on each of the hosts. ie, a cheap and cheerful 
anycast setup. bgp as your routing protocol should work well for this if you're 
interested in an active/passive setup.

the second option could be to set up a l2 medium between your hosts, 
specifically, you can set up etherip tunnels between them and land your carp 
interface on that.

just some ideas.

cheers,
dlg

Re: Per-device multiqueuing would be fantastic. Are there any plans? Are donations a matter here?

2017-02-10 Thread David Gwynne

> On 9 Feb 2017, at 7:11 pm, Mikael <mikael.ml...@gmail.com> wrote:
>
> 2017-02-09 16:41 GMT+08:00 David Gwynne <da...@gwynne.id.au>:
> ..
> hey mikael,
>
> can you be more specific about what you mean by multiqueuing for disks? even
a
> reference to an implementation of what you’re asking about would help me
> answer this question.
>
> ill write up a bigger reply after my kids are in bed.
>
> cheers,
> dlg
>
> Hi David,
>
> Thank you for your answer.
>
> The other OpenBSD:ers I talked to also used the wording "multiqueue". My
understanding of the kernel's workings here is too limited.
>
> If I would give a reference to some implementation out there, I guess I
would to the one introduced in Linux 3.13/3.16:
>
> "Linux Block IO: Introducing Multi-queue SSD Access on Multi-core Systems"
> http://kernel.dk/blk-mq.pdf
>
> "Linux Multi-Queue Block IO Queueing Mechanism (blk-mq)"
>
https://www.thomas-krenn.com/en/wiki/Linux_Multi-Queue_Block_IO_Queueing_Mech
anism_(blk-mq)
>
> "The multiqueue block layer"
> https://lwn.net/Articles/552904/
>
> Looking forward a lot to your followup.

sorry, i feel asleep too.

thanks for the links to info on linux mq stuff. i can understand what it
provides. however, in the situation you are testing im not sure it is
necessarily the means to addressing the difference in performance you’re
seeing in your environment.

anyway, tldr: you’re suffering under the kernels big giant lock.

according to the dmesg you provided you’re testing a single ssd (a samsung
850) connected to a sata controller (ahci). with this equipment all operations
between the computer and the actual disk are all issued through achi. because
of way ahci operates, operations on a specific disk are effectively serialises
at this point. in your setup you have multiple cpus though, and it sounds like
your benchmark runs on them concurrently, issuing io through the kernel to the
disk via ahci.

two things are obviously different between linux and openbsd that would affect
this benchmark. the first is that io to physical devices is limited to a value
called MAXPHYS in the kernel, which is 64 kilobytes. any larger read
operations issued by userland to the kernel get cut up into a series of 64k
reads against the disk. ahci itself can handle 4 meg per transfer.

the other difference is that, like most of the kernel, read() is serialised by
the big lock. the result of this is if you have userland on multiple cpus
creating a heavily io bound workload, all the cpus end up waiting for each
other to run. while one cpu is running through the io stack down to ahci,
every other cpu is spinning waiting for its turn to do the same thing.

the distance between userland and ahci is relatively long. going through the
buffer cache (i.e., /dev/sd0) is longer than bypassing it (through /dev/rsd0).
your test results confirm this.

the solution to this problem is to look at taking the big lock away from the
io paths. this is non-trivial work though.

i have already spent time working on making sd(4) and the scsi midlayer
mpsafe, but haven’t been able to take advantage of that work because both
sides of the scsi subsystem (adapters like ahci and the block layer and
syscalls) still need the big lock. some adapters have been made mpsafe, but i
dont think ahci was on that list. when i was playing with mpsafe scsi, i gave
up the big lock at the start of sd(4) and ran it, the midlayer, and mpi(4) or
mpii(4) unlocked. if i remember correctly, even just unlocking that part of
the stack doubled the throughput of the system.

the work ive done in the midlayer should mean if we can access it without
biglock, accesses to disks beyond adapters like ahci should scale pretty well
cpu cores because of how io is handed over to the midlayer. concurrent
submissions by multiple cpus end up delegating one of the cpus to operate on
the adapter on behalf of all the cpus. while that first cpu is still
submitting to the hardware, other cpus are not blocked from queuing more work
and returning to user land.

i can go into more detail if you want.

cheers,
dlg

Re: Per-device multiqueuing would be fantastic. Are there any plans? Are donations a matter here?

2017-02-09 Thread David Gwynne

> On 9 Feb 2017, at 12:42 pm, Mikael  wrote:
>
> Hi misc@,
>
> The SSD reading benchmark in the previous email shows that per-device
> multiqueuing will boost multithreaded random read performance very much
> e.g. by ~7X+, e.g. the current 50MB/sec will increase to ~350MB/sec+.
>
> (I didn't benchmark yet but I suspect the current 50MB/sec is system-wide,
> whereas with multiqueuing the 350MB/sec+ would be per drive.)
>
> Multiuser databases, and any parallell file reading activity, will/would
> see a proportional speedup with multiqueing.

hey mikael,

can you be more specific about what you mean by multiqueuing for disks? even a
reference to an implementation of what you’re asking about would help me
answer this question.

ill write up a bigger reply after my kids are in bed.

cheers,
dlg

>
>
> Do you have plans to implement this?
>
> Was anything done to this end already, any idea when multiqueueing can
> happen?
>
>
> Are donations a matter here, if so about what size of donations and to who?
>
> Someone suggested that implementing it would take a year of work.
>
> Any clarifications of what's going on and what's possible and how would be
> much appreciated.
>
>
> Thanks,
> Mikael

Re: NVM Express (NVMe) support status

2016-04-15 Thread David Gwynne

> On 12 Feb 2016, at 7:01 PM, Evgeniy Sudyr  wrote:
>
> Hi all,
>
> I'm looking status of NVM Express support in -current (got Intel 750
> consumer device
>
https://www-ssl.intel.com/content/www/us/en/solid-state-drives/solid-state-dr
ives-750-series.html
> for home desktop, but it looks like all devices are using the same
> Specification).
>
> I found 2 commits of nvme_pci.c from @dlg there:
>
> http://cvsweb.openbsd.org/cgi-bin/cvsweb/src/sys/dev/pci/nvme_pci.c
>
> But commit message sounds work is abandoned, because of problems faced.
>
> I found specification exists there
http://www.nvmexpress.org/specifications/
>
> It also works for me under Linux and NVMe driver is maintained by
> Intel developer Matthew Wilcox.
> https://github.com/torvalds/linux/tree/master/drivers/nvme
>
> Looks already implemented in FreeBSD (didn't tested yet):
>
>
http://svnweb.freebsd.org/base/head/sys/dev/nvme/nvme.h?view=log=2406
16
> https://svnweb.freebsd.org/base/head/sys/dev/nvme/
>
> It will be great to get this "awesome fast" storage support in next
> OpenBSD release(s).
>
> Anybody aware of any plans on this?

it might work if you give it a go now.

Re: Gif tunnel / pf / queueing

2016-03-02 Thread David Gwynne

> On 2 Mar 2016, at 1:51 AM, Christopher Sean Hilton 
wrote:
>
> I would like to apply queueing to packets traversing a gif tunnel. I'd
> like to know what works better, Tagging outbound packets on the gif
> interface and applying them to queues by tag when they leave on the
> external interface? Or assigning packets to the queues directly when
> they are on the gif interface?
>
> If I understand things correctly queues work on interfaces. That leads
> me to think that tagging for later queueing is the better approach.

in this instance it shouldn't matter. however, if you have multiple outgoing
interfaces the gif traffic can leave on, it's better to apply the policy on
the gif interface.

>
> --
> Chris
>
>  __o  "All I was trying to do was get home from work."
>_`\<,_   -Rosa Parks
> ___(*)/_(*).___o..___..o...ooO..._
> Christopher Sean Hilton[chris/at/vindaloo/dot/com]
>
> [demime 1.01d removed an attachment of type application/pgp-signature which
had a name of signature.asc]

Re: PF: can't make queueing and priority work as expected

2016-01-15 Thread David Gwynne

> On 15 Jan 2016, at 9:07 PM, Craig Skinner <skin...@britvault.co.uk> wrote:
>
> On 2016-01-15 Fri 12:53 PM |, David Gwynne wrote:
>>> On 13 Jan 2016, at 19:19, Marko Cupa?? <marko.cu...@mimar.rs> wrote:
>>>
>>> Have we come to conclusion that currently prio makes no sense at all?
>>
>> it wont have the effect you want. that doesn't mean it doesn't make sense
>> somewhere else.
>>
>
> Such as an ADSL PPPoE bridge?

yeah.

the other thing to note is that loading a ruleset resets the assignment of
existing states to queues.

states are assigned to queues via rules, but if the rules go away (which is
what happens when you load a new ruleset) the intermediary between rules and
queues has gone.

it kind of sucks, especially for testing.

dlg

Re: PF: can't make queueing and priority work as expected

2016-01-14 Thread David Gwynne

> On 13 Jan 2016, at 19:19, Marko Cupać <marko.cu...@mimar.rs> wrote:
>
> On Tue, 12 Jan 2016 16:40:58 +0100
> Claudio Jeker <cje...@diehard.n-r-g.com> wrote:
>
>> On Tue, Jan 12, 2016 at 05:33:06AM -0700, Daniel Melameth wrote:
>>> On Mon, Jan 11, 2016 at 9:37 PM, David Gwynne <da...@gwynne.id.au>
>>> wrote:
>>>>> On 11 Jan 2016, at 22:43, Daniel Melameth <dan...@melameth.com>
>>>>> wrote: On Sun, Jan 10, 2016 at 7:58 AM, Marko Cupa??
>>>>> <marko.cu...@mimar.rs>
>>> wrote:
>>>>>> On Sat, 9 Jan 2016 11:11:27 -0700
>>>>>> Daniel Melameth <dan...@melameth.com> wrote:
>>>>>>> You NEED to set a max on your ROOT queues.
>>>>>> I came to this conclusion as well. But not only on root queues.
>>>>>> For example, when max is set on root queue but only bandwidth
>>>>>> on child queues, no shaping takes place...
>>>>> This works for me.
>>>>>> Or, to cut the long story short, if someone can paste queue
>>>>>> definition which accomplishes 'give both queues max bandwidth,
>>>>>> but throttle traffic from first queue when traffic from the
>>>>>> second one arrives', I will be more than happy to quit
>>>>>> bothering misc@ list readers with my rants and observations.
>>>>> I would expect this to be possible with prio alone, but I've
>>>>> never been able to get it to work.  Perhaps I'm misunderstanding
>>>>> how prio works.
>>>> prio is basically an array of lists of packets to be transmitted.
>>>> high
>>> priority packets go on a different list to low priority packets.
>>>>
>>>> the problem is the way packets go on and off these lists.
>>>> basically as soon
>>> as a packet is queued on one of these lists for transmission, we
>>> call the driver immediately to send it. generally as soon as a
>>> packet is queued on the interface, it immediately gets dequeued by
>>> the driver and transmitted on the hardware.
>>>>
>>>> it is only when you build up a backlog of packets that priq can
>>>> come into
>>> effect. the only way you can build up a backlog of packets is if
>>> your hardware is slower at transmitting packets than the thing that
>>> generates these packets to send.
>>>>
>>>> in your case you're probably getting packets from a relatively
>>>> slow internet
>>> connection and transmitting them on a high speed local network. the
>>> transmit hardware is almost certainly going to be faster than your
>>> source of packets, so you'll never build up a queue of backlogged
>>> packets, so prio is effectively a nop.
>>>>
>>>> dlg
>>>
>>> Thanks for taking the time to chime in guys.  Prior to implementing
>>> any queueing, I tested this stuff out on a LAN--so no slower
>>> connectionswere involved--and I was unable to see prio in action, at
>>> least not with any observable similarity to ALTQ's PRIQ.
>>>
>>> A simple rule set:
>>>
>>> match out on egress proto tcp to port 12345 set prio 7
>>> match out on egress proto tcp to port 12346 set prio 0
>>> pass
>>>
>>> Using tcpbench to push packets into both queues, I would have
>>> expected the packets destined for port 12346 to get throttled, but
>>> both flows simply reached an equilibrium, which I would have
>>> expected without prio.  Under PRIQ, I would have seen the flow to
>>> port 12346 get almost completely starved of bandwidth.  When doing
>>> non-prio queuing with a similarly simple ruleset, both flows
>>> properly matched their target bandwidth.
>>
>> This assumes that you manage to fill the TX interface queue to a level
>> that it always fills the tx DMA rings before being empty. On high
>> speed interfaces this most of the time not the case and so both
>> sessions are able to reach the maximum bandwidth.
>> To be honest prio queue only make sense when you have a slow interface
>> (10Mbps) or a shaper in place that causes the queue to fill up.
>> There is currently no shaper you can use together with the prio
>> queues so only option one remains.
>>
>
> Have we come to conclusion that currently prio makes no sense at all?

it wont have the effect you want. that doesn't mean it doesn't make sense
somewhere else.

>
> Can I hope that saying 'currently' means this is not the intended
> design? Or should I come to peace with the fact that with OpenBSD and
> PF I can forget about shaping inbound TCP traffic in a way that
> child queues can expand to max link bandwidth unless there is a
> congestion, while in congestion admin can choose which child queues to
> throttle and in which order?

hfsc might need some work at the code level, it might just suck to configure.

>
> --
> Before enlightenment - chop wood, draw water.
> After  enlightenment - chop wood, draw water.
>
> Marko Cupać
> https://www.mimar.rs/

Re: PF: can't make queueing and priority work as expected

2016-01-11 Thread David Gwynne

> On 11 Jan 2016, at 22:43, Daniel Melameth  wrote:
>
> On Sun, Jan 10, 2016 at 7:58 AM, Marko Cupać  wrote:
>> On Sat, 9 Jan 2016 11:11:27 -0700
>> Daniel Melameth  wrote:
>>> You NEED to set a max on your ROOT queues.
>> I came to this conclusion as well. But not only on root queues. For
>> example, when max is set on root queue but only bandwidth on child
>> queues, no shaping takes place...
>
> This works for me.
>
>> Or, to cut the long story short, if someone can paste queue definition
>> which accomplishes 'give both queues max bandwidth, but throttle
>> traffic from first queue when traffic from the second one arrives', I
>> will be more than happy to quit bothering misc@ list readers with my
>> rants and observations.
>
> I would expect this to be possible with prio alone, but I've never
> been able to get it to work.  Perhaps I'm misunderstanding how prio
> works.

prio is basically an array of lists of packets to be transmitted. high
priority packets go on a different list to low priority packets.

the problem is the way packets go on and off these lists. basically as soon as
a packet is queued on one of these lists for transmission, we call the driver
immediately to send it. generally as soon as a packet is queued on the
interface, it immediately gets dequeued by the driver and transmitted on the
hardware.

it is only when you build up a backlog of packets that priq can come into
effect. the only way you can build up a backlog of packets is if your hardware
is slower at transmitting packets than the thing that generates these packets
to send.

in your case you're probably getting packets from a relatively slow internet
connection and transmitting them on a high speed local network. the transmit
hardware is almost certainly going to be faster than your source of packets,
so you'll never build up a queue of backlogged packets, so prio is effectively
a nop.

dlg

Re: authentication infra structure

2015-12-10 Thread David Gwynne

> On 10 Dec 2015, at 5:25 AM, Friedrich Locke 
wrote:
>
> If you had about 10k users and 5k machine how would you manage
> authenticating issues? Keep in mind that this is a very heterogenous
> environment with ldap, ftp, smtp, pop3, traditional unix boxes etc 

we use ypldapd talking to our directory to make users appear on the box. nfs
for homedirs. at the moment we're using krb for auth, but im looking to change
that soon.

dlg

Re: Chelsio T4 10g adapters support ?

2015-12-09 Thread David Gwynne

> On 10 Dec 2015, at 12:28, Brendan Horan  wrote:
>
> Hi,
>
> I am looking at building a system running OpenBSD to deal with 10g
networks.
>
> It would seem there is good support for Intel cards via the "ix" driver.
> However I was looking at Chelsio cards.
> It seems the "che" driver only supports T3 series and the PE9000 cards.
>
> However the T3 series is PCIe 1.1,
> not exactly useful on a dual port 10gbe card.
> Thus I was looking at the T4 series cards.
>
> Would there be much needed to get one of them working on OpenBSD ?
> If the answer to that is "no clue",
> would the card make a good donation to someone at OpenBSD?
> FreeBSD has support for T4 cards if that helps.
>
> I am still unsure if I want this card or an Intel card at this point.
>
> Thanks for your time

you want an ix(4) for now.

there's a few 10g chips we dont have support for yet, but developer time is
more of a constraint than lack of hardware at the moment.

dlg

Re: em(4) watchdog timeouts

2015-11-15 Thread David Gwynne

On Fri, Nov 13, 2015 at 10:18:51AM -0500, Sonic wrote:
> On Wed, Nov 11, 2015 at 9:20 AM, Gregor Best  wrote:
> > I've done some further testing and I think I've narrowed it down to the
> > "Unlocking em(4) a bit further"-patch [0].

could you try this? its not written with the wdog stuff in mind,
but it does touch that stuff so it might help.

Index: if_em.c
===
RCS file: /cvs/src/sys/dev/pci/if_em.c,v
retrieving revision 1.310
diff -u -p -r1.310 if_em.c
--- if_em.c 29 Oct 2015 03:19:42 -  1.310
+++ if_em.c 15 Nov 2015 14:01:39 -
@@ -605,16 +605,20 @@ em_start(struct ifnet *ifp)
}
 
for (;;) {
-   IFQ_POLL(>if_snd, m_head);
-   if (m_head == NULL)
-   break;
-
-   if (em_encap(sc, m_head)) {
+   if (sc->num_tx_desc_avail < EM_MAX_SCATTER + 2) {
ifp->if_flags |= IFF_OACTIVE;
break;
}
 
IFQ_DEQUEUE(>if_snd, m_head);
+   if (m_head == NULL)
+   break;
+
+   if (em_encap(sc, m_head)) {
+   m_freem(m_head);
+   ifp->if_oerrors++;
+   continue;
+   }
 
 #if NBPFILTER > 0
/* Send a copy of the frame to the BPF listener */
@@ -622,9 +626,6 @@ em_start(struct ifnet *ifp)
bpf_mtap_ether(ifp->if_bpf, m_head, BPF_DIRECTION_OUT);
 #endif
 
-   /* Set timeout in case hardware has problems transmitting */
-   ifp->if_timer = EM_TX_TIMEOUT;
-
post = 1;
}
 
@@ -637,8 +638,11 @@ em_start(struct ifnet *ifp)
 * this tells the E1000 that this frame is
 * available to transmit.
 */
-   if (post)
+   if (post) {
E1000_WRITE_REG(>hw, TDT, sc->next_avail_tx_desc);
+
+   ifp->if_timer = EM_TX_TIMEOUT;
+   }
}
 }
 
@@ -1104,12 +1108,6 @@ em_encap(struct em_softc *sc, struct mbu
struct em_buffer   *tx_buffer, *tx_buffer_mapped;
struct em_tx_desc *current_tx_desc = NULL;
 
-   /* Check that we have least the minimal number of TX descriptors. */
-   if (sc->num_tx_desc_avail <= EM_TX_OP_THRESHOLD) {
-   sc->no_tx_desc_avail1++;
-   return (ENOBUFS);
-   }
-
if (sc->hw.mac_type == em_82547) {
bus_dmamap_sync(sc->txdma.dma_tag, sc->txdma.dma_map, 0,
sc->txdma.dma_map->dm_mapsize,
@@ -1147,9 +1145,6 @@ em_encap(struct em_softc *sc, struct mbu
 
EM_KASSERT(map->dm_nsegs!= 0, ("em_encap: empty packet"));
 
-   if (map->dm_nsegs > sc->num_tx_desc_avail - 2)
-   goto fail;
-
if (sc->hw.mac_type >= em_82543 && sc->hw.mac_type != em_82575 &&
sc->hw.mac_type != em_82580 && sc->hw.mac_type != em_i210 &&
sc->hw.mac_type != em_i350)
@@ -1168,9 +1163,9 @@ em_encap(struct em_softc *sc, struct mbu
 * Check the Address and Length combination and
 * split the data accordingly
 */
-   array_elements = 
em_fill_descriptors(map->dm_segs[j].ds_addr,
-
map->dm_segs[j].ds_len,
-_array);
+   array_elements = em_fill_descriptors(
+   map->dm_segs[j].ds_addr,
+   map->dm_segs[j].ds_len, _array);
for (counter = 0; counter < array_elements; counter++) {
if (txd_used == sc->num_tx_desc_avail) {
sc->next_avail_tx_desc = txd_saved;
@@ -2481,8 +2476,7 @@ em_txeof(struct em_softc *sc)
 * If we have enough room, clear IFF_OACTIVE to tell the stack
 * that it is OK to send packets.
 */
-   if (ISSET(ifp->if_flags, IFF_OACTIVE) &&
-   num_avail > EM_TX_OP_THRESHOLD) {
+   if (num_avail > 0 && ISSET(ifp->if_flags, IFF_OACTIVE)) {
KERNEL_LOCK();
CLR(ifp->if_flags, IFF_OACTIVE);
em_start(ifp);

Re: Dell S300 controller

2015-05-08 Thread David Gwynne

 On 8 May 2015, at 12:41 pm, Jim Giannoules j...@devio.us wrote:
 
 On Tue, May 05, 2015 at 06:54:37PM +, Stuart Henderson wrote:
 On 2015-05-05, Jack Peirce jpei...@sourcecode.com wrote:
 On Mon, May 04, 2015 at 08:22:28PM -0400, Steve Shockley wrote:
 Does anyone know if the Dell PERC S300 controller will work under 
 OpenBSD as a non-RAID SAS HBA?  It has an LSI SAS 1068e, but I didn't 
 know if they did something to make it not work as an HBA.  Thanks.
 
 I don't believe the controller will automatically export unconfigured
 drives as single drive units. LSI makes 2 different versions of 
 firmware for the unbranded controllers, IR mode for RAID and IT mode 
 for HBA, but it's not possible/easy to flash them to the Dell branded 
 controllers.
 
 Create RAID0 single drive units on each disk and it should export.
 
 
 
 AFAIK the S300 doesn't work at all on OpenBSD (or Linux). It was only ever
 meant to work with Windows.
 
 
 The Dell PERC S300 is a SWRAID product. It is correct that the hardware is 
 an LSI1068e, but programmed with modified PCI IDs (all four a diffferent: 
 vendor, device, sub-vendor, sub-device). The expansion ROM and drivers are 
 from DotHill systems and are looking for these update IDs. The controller 
 itself is running the IR/IT firmware with IR soft-disabled. To turn the 
 controller back into a normal LSI1068e you would need to update the expansion 
 ROM and the PCI IDs.
 
 As a science experiment you might be able to modify mpi(4) to look for the 
 S300 IDs, but that would be an OS runtime only fix.
 

im pretty sure the s300 is actually the ahci ports coming off the motherboard. 
if its in ahci mode it should Just Work(tm) as a sata controller. not sas, 
sorry.

the h200 was the last straight sas hba you could get in a dell. if you want sas 
ports in their more recent machines you can configure physical disks on a h330 
or h730, both of which are mfii controllers.

dlg

Re: Not Detecting Broadcom NetXtreme II 10GBase-T adapter

2015-03-10 Thread David Gwynne

i havent written a driver for it yet.

 On 10 Mar 2015, at 10:07 pm, Ninad Shaha ninadsh...@iitb.ac.in wrote:
 
 Dear All,
 
 I have installed OpenBSD 5.6 on IBM X3650 M4 server. This server 
 contains 2 numbers of Broadcom NetXtreme II BCM57712 10GBase-T dual port 
 adapter. This adapter is not visible or detected by OpenBSD. It just 
 shows not configured in dmesg.
 
 Following is the dmesg output from above server. Please guide me for the 
 same as I am new to BSD.
 
 OpenBSD 5.6 (GENERIC.MP) #333: Fri Aug  8 00:20:21 MDT 2014
 dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
 RTC BIOS diagnostic error 80clock_battery
 real mem = 34315374592 (32725MB)
 avail mem = 33393123328 (31846MB)
 mpath0 at root
 scsibus0 at mpath0: 256 targets
 mainbus0 at root
 bios0 at mainbus0: SMBIOS rev. 2.7 @ 0x7e7be000 (82 entries)
 bios0: vendor IBM version -[VVE142AUS-1.70]- date 06/04/2014
 bios0: IBM 00Y7683
 acpi0 at bios0: rev 2
 acpi0: sleep states S0 S5
 acpi0: tables DSDT FACP TCPA ERST HEST HPET APIC MCFG OEM0 OEM1 SLIT 
 SRAT SLIC SSDT SSDT SSDT SSDT SSDT SSDT SSDT DMAR
 acpi0: wakeup devices MRP1(S4) DCC0(S4) ENET(S4) MRP3(S4) MRP5(S4) 
 EHC2(S5) PEX0(S5) PEX7(S5) EHC1(S5) IP2P(S3) MRPB(S4) MRPC(S4) MRPD(S4) 
 MRPM(S4) MRPE(S4) MRPF(S4) [...]
 acpitimer0 at acpi0: 3579545 Hz, 24 bits
 acpihpet0 at acpi0: 14318179 Hz
 acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
 cpu0 at mainbus0: apid 0 (boot processor)
 cpu0: Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz, 3000.46 MHz
 cpu0: 
 FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PC
 ID,DCA,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,AES,XSAVE,AVX,NXE,PAGE1GB,LONG,LAHF,PERF,ITSC
 cpu0: 256KB 64b/line 8-way L2 cache
 cpu0: smt 0, core 0, package 0
 mtrr: Pentium Pro MTRR support, 10 var ranges, 88 fixed ranges
 cpu0: apic clock running at 99MHz
 cpu0: mwait min=64, max=64, C-substates=0.2.1.1.2, IBE
 cpu1 at mainbus0: apid 2 (application processor)
 cpu1: Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz, 3000.00 MHz
 cpu1: 
 FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PC
 ID,DCA,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,AES,XSAVE,AVX,NXE,PAGE1GB,LONG,LAHF,PERF,ITSC
 cpu1: 256KB 64b/line 8-way L2 cache
 cpu1: smt 0, core 1, package 0
 cpu2 at mainbus0: apid 4 (application processor)
 cpu2: Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz, 3000.00 MHz
 cpu2: 
 FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PC
 ID,DCA,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,AES,XSAVE,AVX,NXE,PAGE1GB,LONG,LAHF,PERF,ITSC
 cpu2: 256KB 64b/line 8-way L2 cache
 cpu2: smt 0, core 2, package 0
 cpu3 at mainbus0: apid 6 (application processor)
 cpu3: Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz, 3000.00 MHz
 cpu3: 
 FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PC
 ID,DCA,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,AES,XSAVE,AVX,NXE,PAGE1GB,LONG,LAHF,PERF,ITSC
 cpu3: 256KB 64b/line 8-way L2 cache
 cpu3: smt 0, core 3, package 0
 cpu4 at mainbus0: apid 8 (application processor)
 cpu4: Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz, 3000.00 MHz
 cpu4: 
 FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PC
 ID,DCA,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,AES,XSAVE,AVX,NXE,PAGE1GB,LONG,LAHF,PERF,ITSC
 cpu4: 256KB 64b/line 8-way L2 cache
 cpu4: smt 0, core 4, package 0
 cpu5 at mainbus0: apid 10 (application processor)
 cpu5: Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz, 3000.00 MHz
 cpu5: 
 FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PC
 ID,DCA,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,AES,XSAVE,AVX,NXE,PAGE1GB,LONG,LAHF,PERF,ITSC
 cpu5: 256KB 64b/line 8-way L2 cache
 cpu5: smt 0, core 5, package 0
 cpu6 at mainbus0: apid 12 (application processor)
 cpu6: Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz, 3000.00 MHz
 cpu6: 
 FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PC
 ID,DCA,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,AES,XSAVE,AVX,NXE,PAGE1GB,LONG,LAHF,PERF,ITSC
 cpu6: 256KB 64b/line 8-way L2 cache
 cpu6: smt 0, core 6, package 0
 cpu7 at mainbus0: apid 14 (application processor)
 cpu7: Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz, 3000.00 MHz
 cpu7:

Re: Emulate apache mod_rewrite feature with new httpd

2015-02-22 Thread David Gwynne

 On 22 Feb 2015, at 20:23, Romain FABBRI romain.fab...@alienconsulting.net 
 wrote:
 
 I've managed to configure the new httpd server to use as a replacement for 
 apache. (With is really great.  Thanks to Reyk !)
 
 I'm strugling to make my drupal site work, because of the clean url module.
 
 I used to have the following apache mod_rewrite configuration :
 RewriteEngine on
 RewriteBase /
 RewriteCond %{REQUEST_FILENAME} !-f
 RewriteCond %{REQUEST_FILENAME} !-d
 RewriteRule ^(.*)$ /index.php?q=$1 [L,QSA]
 
 Basically the rule means that if the file or folder is not found then the 
 request is rewrited to /index.php?q=request
 For exemple if/user doesn't exist modify url to /index.php?q=user
 
 I've looked into man page for httpd and I've seen that the block return 
 statement might be of use to emulate this need. but I haven't found many info 
 on the subject.
 
 Has someone found a way to make that with the new httpd server ?
 
 PS : I'm running from snapshot (5.7 GENERIC#716 i386)
 
 Romain

i havent tried drupal behind httpd yet, but if i did i would unconditionally 
route requests into the drupal controller (index.php), and use a cdn module to 
have drupal generate urls to static assets (ie, the css/js/image files on disk) 
against a separate domain or url prefix. or you could write a simple module 
that takes advantage of hook_file_url_alter. that has greatly simplified our 
configs in the frontend web servers in front of our drupal poop.

Re: YP Alternative

2015-01-04 Thread David Gwynne

 On 4 Jan 2015, at 5:32 pm, Brian Empson br...@teamhandbanana.com wrote:
 
 This sounds interesting. What would you replace krb5 with, if you don't mind 
 me asking? I was contemplating krb5, but the setup and such is a pain for me 
 (because I am not familiar with it). I'll probably wind up rolling something 
 custom with LDAP and YP mappings thrown in.

i dunno. ideally i would just do basic auth over https against something that 
just returns 200 or 403. bsdauth on openbsd means i could probably implement 
that with a crappy script. linux probably has a crazy pam module i could use to 
do auth with http, but the solarish things i run almost certainly dont.

however, linux and solaris still support krb5 auth out of the box, so its only 
a problem i really have to solve on openbsd. or use ldap auth.

 
 On 1/4/2015 2:26 AM, David Gwynne wrote:
 On 2 Jan 2015, at 9:52 pm, Brian Empson br...@teamhandbanana.com wrote:
 
 I'm looking into a way to sync up group and user information across a 
 network of OpenBSD machines. I like YP, except that I don't need the 
 password hashes transferred across the network. I like that it's built 
 right into the base install, are there better ways to handle synchronizing 
 login details across multiple machines that is built into the base install? 
 Preferably written by the OpenBSD team, too?
 while not directly answering your question, i can say openbsd can do this 
 kind of stuff without yp on the wire.
 
 at work i use ypldap to get user/group information from active directory. we 
 populate the rfc2307 attributes on our users and groups to make them useful 
 on unix systems. we use the single directory as a name service backend for 
 openbsd, solaris, linux, and windows (of course).
 
 we're still using krb5 for password authentication. i really have to fix 
 that.
 
 we've also augmented the AD schema to store users ssh keys in the directory 
 too. sshd gets access to them via AuthorizedKeysCommand and a perl script. 
 this allows ssh key based single sign on across all our unixish systems, 
 even if their home directories are not available on the system. this is 
 useful for providing services over ssh. an example of such a service we 
 provide is svn and git on a dedicated server. all our users are on the 
 system via ypldap, and they can auth using their own username and either a 
 password or ssh key.
 
 dlg

1 2 3 >

1 - 100 of 275 matches

Mail list logo