Re: Openbsd VMM with VLAN

2021-06-01 Thread David Gwynne
Hi Irshad,

Assuming I understand your layout correctly, you should be able to use 
hostname.if configurations files like the following:

$ cat hostname.em0:
up

$ cat hostname.vlan20
description "Trusted (L2+L3)"
vnetid 20 parent em0
inet aa.bb.cc.dd 255.255.255.0
up

$ cat hostname.vlan10:
description "IoT (L2)"
vnetid 10 parent em0
up

$ cat hostname.veb10
description "IoT bridge"
add vlan10
add vport10
up

$ cat hostname.vport10
description "IoT (L3)"
inet ee.bb.cc.dd 255.255.255.0
up

With the above, vlan10 on the wire will be connected using veb10 to the IP 
stack on your firewall on vport10. To have the virtual machine also plug into 
that VLAN 10 Ethernet segment, you can use veb10 as your "uplink" switch 
interface in vmm.conf.

dlg

> On 31 May 2021, at 05:44, Irshad  wrote:
> 
> Hi all 
> 
> 
> i have two Openbsd box Running Like Below one As Firewall and Another one As
> VMM
> With two VLAN's
> 
>OPENBSD_FIREWALL
> 
> IoT_AP  (VLAN10) . -VLAN10
>|--OpenWRT-em0---| ---pf --em1--Internet
>||- VLAN20 
> trusted_AP(VLAN20)  
> this Works fine  
> 
> 
> Another Separate OpenBSD Box for VM 
> 
> openbsd(vmGuest)---vether0---openbsdHost——NAT—em0--OpenBSD_FW--Internet
> 
> is it possible Add openbsd(vmguest) to VLAN10 network 
> 
> 
> this is MY vm config [HomeAssistance]
> 
> 
> 
> switch "uplink" {
>interface bridge1
> }
> vm "hass" {
>disable
>owner irshad
>memory 2G
>disk "/home/irshad/iso/disk.qcow2"
> 
>interface {
>switch "uplink"
>lladdr fe:e1:bb:01:01:01
>}
> }
> 
> 
> 



Re: Home Assistant

2021-05-11 Thread David Gwynne



> On 11 May 2021, at 05:01, pas...@pascallen.nl wrote:
> 
> Dear David,
> 
> How do you start homeassistant after a reboot? Manually?

i have these scripts. the pexp in the rc script doesnt work, but i havent 
needed it to yet.

apathy$ cat /etc/rc.d/hass   
#!/bin/ksh

daemon="/opt/local/sbin/hass --daemon"
daemon_user="_hass"

pexp="/opt/hass/bin/hass"

. /etc/rc.d/rc.subr

rc_reload=NO

rc_cmd $1
apathy$ cat /opt/local/sbin/hass 
#!/bin/ksh

. /opt/hass/bin/activate

/opt/hass/bin/hass "$@"




Re: pf ipv6 source-routing 6.9

2021-05-10 Thread David Gwynne



> On 10 May 2021, at 8:05 pm, Bastien Durel  wrote:
> 
> Le samedi 08 mai 2021 à 12:07 +0200, Bastien Durel a écrit :
>> Le 08/05/2021 à 11:56, Stuart Henderson a écrit :
> Does it work if you use the syntax suggested in the upgrade
> notes
> for the example with "pass in on pppoe1 reply-to ..."?
> 
> 
 For incoming connections, I tried
 
 pass in on pppoe0 inet6 reply-to fe80::520f:80ff:fe65:8800%pppoe0
 keep state
 pass in on pppoe0 inet6 reply-to fe80::520f:80ff:fe65:8800 keep
 state
> 
> Hello,
> 
> Thanks to folks of #openbsd, I found out adding an explicit route to
> fe80::520f:80ff:fe65:8800 on pppoe0 make this work.
> Referencing fe80::520f:80ff:fe65:8800%pppoe0 in pf.conf results in a
> rule referencing fe80::520f:80ff:fe65:8800
> 
> pf.conf:
> pass in on pppoe0 inet6 reply-to fe80::520f:80ff:fe65:8800%pppoe0
> pfctl -s rules:
> pass in on pppoe0 inet6 all flags S/SA reply-to fe80::520f:80ff:fe65:8800
> 
> hostname.pppoe0:
> !/sbin/route add -inet6 fe80::520f:80ff:fe65:8800 -ifp pppoe0 fe80::%pppoe0
> 
> This make pf able to route to the correct interface.

You're right, pf isn't very good at handling link-local v6 addresses. This is 
annoying now that route-to uses addresses as it's argument if you want to move 
ipv6 packets toward a host with a link local address.

In this situation the least worst way to cope with the problem for now is to 
use route-to (pppoe0:0). This should work because route-to doesn't do any local 
address checks on the destination address it resolves. Once it looks up the 
local address as the direction to send the packet, it should put it straight 
out pppoe0. ppp as a tunnel interface has no address resolution protocol, it 
just encapsulates the packet it is given and sends it on its way.

route-to also takes a destination address as an argument, not a gateway 
address. If dhcp6c sets up a route to some global address that you know about 
(I'm not sure this is a thing but it might be), you can use that global address 
as the argument to route-to and it will send it in the right direction.

dlg


> Regards,
> 
> -- 
> Bastien
> 



Re: virtual cluster with rdomain(4)

2021-05-10 Thread David Gwynne
he: 100, timeout: 240):
fe:e1:ba:d2:4a:be vport1 16 flags=0<>
fe:e1:ba:d3:17:a0 vport2 16 flags=0<>
ix#

dlg

> 
> thanks
> Thomas
> 
> On Mon, 10 May 2021 at 08:10, David Gwynne  wrote:
> >
> > Hi Thomas,
> >
> > I'd give this a go with vport(4) interfaces instead of vether(4), and
> join them all together at layer 2 by adding them to a single veb(4).
> >
> > Cheers,
> > dlg
> >
> > > On 10 May 2021, at 03:04, Thomas Huber  wrote:
> > >
> > > Hi misc,
> > >
> > > I wanted to tinker with the cluster manager sysutils/nomad but
> > > unfortunately I??ve no spare cluster for tinkering...
> > >
> > > So I had the idea of utilizing OpenBSDs outstanding
> > > possibilities for network isolation to create a
> > > virtual cluster on my VM at openbsd.amsterdam.
> > >
> > > I had different ideas to achieve it but nothing worked so far.
> > > So I'd describe my first approach because I think this is the
> > > most OpenBSD idiomatic one:
> > >
> > > I created 5 vether[0-4] devices, everyone in its own rdomain [0-4]
> > > and assigned every device its own inet address space 10.10.[0-4].1/24
> > >
> > > I also set the 10.10.[0-4].1 as default route in each rtable.
> > >
> > > Now I learned that pf(4) is needed to route between this 5 rdomains
> > > but after several attempts I've no clue how this could be defined.
> > >
> > > Actually I wanted rdomain 0 to work as hub for all rdomains >0.
> > > Maybe someone can hint me in the right direction
> > >
> > > regards
> > > Thomas (host of the u2k20-hackathon, if someone remembers ;-)
> > >
> > > some further listings if my description above is unclear:
> > >
> > >
> > > ud$ ifconfig vether
> > > vether0: flags=8843 mtu 1500
> > > lladdr fe:e1:ba:d7:cc:16
> > > index 23 priority 0 llprio 3
> > > groups: vether
> > > media: Ethernet autoselect
> > > status: active
> > > inet 10.10.0.1 netmask 0xff00 broadcast 10.255.255.255
> > >
> > > vether1: flags=8843 rdomain 1
> mtu
> > > 1500
> > > lladdr fe:e1:ba:d8:73:32
> > > index 24 priority 0 llprio 3
> > > groups: vether
> > > media: Ethernet autoselect
> > > status: active
> > > inet 10.10.1.1 netmask 0xff00 broadcast 10.255.255.255
> > >
> > > vether2: flags=8843 rdomain 2
> mtu
> > > 1500
> > > lladdr fe:e1:ba:d9:bd:e8
> > > index 26 priority 0 llprio 3
> > > groups: vether
> > > media: Ethernet autoselect
> > > status: active
> > > inet 10.10.2.1 netmask 0xff00 broadcast 10.255.255.255
> > >
> > > vether3: flags=8843 rdomain 3
> mtu
> > > 1500
> > > lladdr fe:e1:ba:da:07:4d
> > > index 28 priority 0 llprio 3
> > > groups: vether
> > > media: Ethernet autoselect
> > > status: active
> > > inet 10.10.3.1 netmask 0xff00 broadcast 10.255.255.255
> > >
> > > vether4: flags=8843 rdomain 4
> mtu
> > > 1500
> > > lladdr fe:e1:ba:db:31:c8
> > > index 30 priority 0 llprio 3
> > > groups: vether
> > > media: Ethernet autoselect
> > > status: active
> > > inet 10.10.4.1 netmask 0xff00 broadcast 10.255.255.255
> > >
> > > ud$ netstat -R
> > > Rdomain 0
> > >  Interfaces: lo0 vio0 enc0 pflog0 vether0
> > >  Routing tables: 0 71
> > >
> > > Rdomain 1
> > >  Interfaces: vether1 lo1
> > >  Routing table: 1
> > >
> > > Rdomain 2
> > >  Interfaces: vether2 lo2
> > >  Routing table: 2
> > >
> > > Rdomain 3
> > >  Interfaces: vether3 lo3
> > >  Routing table: 3
> > >
> > > Rdomain 4
> > >  Interfaces: vether4 lo4
> > >  Routing table: 4



Re: Home Assistant

2021-05-10 Thread David Gwynne
ive been running hass on openbsd for a while now, and just did a new
install on 6.9 for my boss on the weekend.

i set up a _hass user for it to run as, and gave it /opt/hass:

hass$ getent passwd _hass
_hass:*:2000:2000:Home Assistant:/opt/hass:/sbin/nologin
hass$ getent group 2000
_hass:*:2000
hass$ ls -ld /opt/hass
drwxr-xr-x  8 _hass  _hass  512 May  8 22:35 /opt/hass

i installed mosquitto, python3.8, py3-virtualenv, py3-pip,
py3-cryptography, py3-Pillow, and py3-zeroconf from ports. then
as the _hass users i set up a venv in /opt/hass with virtualenv
--system-site-packages /opt/hass, did the . /opt/hass/bin/activate
thing, then ran pip install homeassistant.

that got me far enough stuff to be able to start home assistant. you're
on your own after this.

good luck.

dlg

On Sat, May 08, 2021 at 06:53:54PM +0200, pas...@pascallen.nl wrote:
> Dear all,
> 
> What would be the best way to install HASS on Openbsd?
> Containers are a nogo?
> 
> Run it in virtual env from python?
> 
> Any Howto on the subject with Openbsd?
> 
> 
> Currently I got it running as from the website with the "core" version.
> But a startup script which runs with a non-root user is where I get
> stuck.
> 
> 
> 
> 
> -- 
> Met vriendelijke groet,
> 
> Pascal Huisman
> 
> 
> Fundamentally, there may be no basis for anything.
> 




Re: virtual cluster with rdomain(4)

2021-05-10 Thread David Gwynne
Hi Thomas,

I'd give this a go with vport(4) interfaces instead of vether(4), and join them 
all together at layer 2 by adding them to a single veb(4).

Cheers,
dlg

> On 10 May 2021, at 03:04, Thomas Huber  wrote:
> 
> Hi misc,
> 
> I wanted to tinker with the cluster manager sysutils/nomad but
> unfortunately I´ve no spare cluster for tinkering...
> 
> So I had the idea of utilizing OpenBSDs outstanding
> possibilities for network isolation to create a
> virtual cluster on my VM at openbsd.amsterdam.
> 
> I had different ideas to achieve it but nothing worked so far.
> So I'd describe my first approach because I think this is the
> most OpenBSD idiomatic one:
> 
> I created 5 vether[0-4] devices, everyone in its own rdomain [0-4]
> and assigned every device its own inet address space 10.10.[0-4].1/24
> 
> I also set the 10.10.[0-4].1 as default route in each rtable.
> 
> Now I learned that pf(4) is needed to route between this 5 rdomains
> but after several attempts I've no clue how this could be defined.
> 
> Actually I wanted rdomain 0 to work as hub for all rdomains >0.
> Maybe someone can hint me in the right direction
> 
> regards
> Thomas (host of the u2k20-hackathon, if someone remembers ;-)
> 
> some further listings if my description above is unclear:
> 
> 
> ud$ ifconfig vether
> vether0: flags=8843 mtu 1500
> lladdr fe:e1:ba:d7:cc:16
> index 23 priority 0 llprio 3
> groups: vether
> media: Ethernet autoselect
> status: active
> inet 10.10.0.1 netmask 0xff00 broadcast 10.255.255.255
> 
> vether1: flags=8843 rdomain 1 mtu
> 1500
> lladdr fe:e1:ba:d8:73:32
> index 24 priority 0 llprio 3
> groups: vether
> media: Ethernet autoselect
> status: active
> inet 10.10.1.1 netmask 0xff00 broadcast 10.255.255.255
> 
> vether2: flags=8843 rdomain 2 mtu
> 1500
> lladdr fe:e1:ba:d9:bd:e8
> index 26 priority 0 llprio 3
> groups: vether
> media: Ethernet autoselect
> status: active
> inet 10.10.2.1 netmask 0xff00 broadcast 10.255.255.255
> 
> vether3: flags=8843 rdomain 3 mtu
> 1500
> lladdr fe:e1:ba:da:07:4d
> index 28 priority 0 llprio 3
> groups: vether
> media: Ethernet autoselect
> status: active
> inet 10.10.3.1 netmask 0xff00 broadcast 10.255.255.255
> 
> vether4: flags=8843 rdomain 4 mtu
> 1500
> lladdr fe:e1:ba:db:31:c8
> index 30 priority 0 llprio 3
> groups: vether
> media: Ethernet autoselect
> status: active
> inet 10.10.4.1 netmask 0xff00 broadcast 10.255.255.255
> 
> ud$ netstat -R
> Rdomain 0
>  Interfaces: lo0 vio0 enc0 pflog0 vether0
>  Routing tables: 0 71
> 
> Rdomain 1
>  Interfaces: vether1 lo1
>  Routing table: 1
> 
> Rdomain 2
>  Interfaces: vether2 lo2
>  Routing table: 2
> 
> Rdomain 3
>  Interfaces: vether3 lo3
>  Routing table: 3
> 
> Rdomain 4
>  Interfaces: vether4 lo4
>  Routing table: 4



Re: Working with encapsulated traffic using PF (pass incoming IPv4 from IPv6 gif tunnel)

2021-04-14 Thread David Gwynne


> On 9 Apr 2021, at 18:55, Martin  wrote:
> 
> Hello list,
> 
> I have working IPv4 OpenBSD router. There are no problems with native IPv4 
> and IPv6 traffic filtering/redirecting at all.
> 
> Now stuck with filtering IPv4 traffic encapsulated in IPv6 tunnel using gif 
> interface.
> 
> IPv6 interface is tun0 which has assigned unique IPv6 address, and gif0 has 
> the same unique IPv6 as tun0 with wrapped IPv4 into IPv6 as shows in configs.
> 
> The same configuration from the opposite side, except IPv4 and IPv6 source 
> and destination addresses reversed to make a tunnel.
> 
> I'm not sure if I needed to use a bridge between tun0 and gif0 to have it 
> working.
> 
> Looking for appropriate PF filtering rule to pass IPv4 encapsulated traffic 
> appearing on tun0 and blocks by "block all" PF rule for some reason.
> 
> Any ideas welcome.
> 
> === Side-a ===
> 
> # cat /etc/hostname.gif0
> # gif0
> up
> description 'IPv4 over IPv6 tunnel'
> # tunnel [src IPv6] [dst IPv6]
> tunnel :::::18b5 :::::a503
> inet alias 10.190.0.1
> dest 10.190.0.2
> 
> # ifconfig tun0
> tun0: flags=8051 mtu 1500
>index 44 priority 0 llprio 3
>groups: tun
>status: active
>inet6 fe80::5054:ffc:fe04:f824%tun0 ->  prefixlen 64 scopeid 0x2c
>inet6 :::::18b5 ->  prefixlen 48
> 
> === Side-b ===
> 
> # cat /etc/hostname.gif0
> # gif0
> up
> description 'IPv4 over IPv6 tunnel'
> # tunnel [src IPv6] [dst IPv6]
> tunnel :::::a503 :::::18b5
> inet alias 10.190.0.2
> dest 10.190.0.1
> 
> # ifconfig tun0
> tun0: flags=8051 mtu 1500
>index 44 priority 0 llprio 3
>groups: tun
>status: active
>inet6 fe80::2a15:f3af:fefb:a3b0%tun0 ->  prefixlen 64 scopeid 0x2c
>inet6 :::::a503 ->  prefixlen 48
> 

Hi Martin,

bridge(4) only works with Ethernet interfaces, there is no equivalent to 
bridge(4) for tunnels. I don't think that's related or necessary for solving 
your problem though.

Without a look at your ipv6 routing table it's hard to tell what could be 
happening here. My first impression is that your routers don't have routes for 
the IPv6 endpoints over the tun0 interfaces. For this to work, I'd expect to 
see something like this in your tun0 output:

=== Side-a ===

# ifconfig tun0
tun0: flags=8051 mtu 1500
   index 44 priority 0 llprio 3
   groups: tun
   status: active
   inet6 fe80::5054:ffc:fe04:f824%tun0 ->  prefixlen 64 scopeid 0x2c
   inet6 :::::18b5 -> :::::a503 prefixlen 
128

and:

=== Side-b ===

# ifconfig tun0
tun0: flags=8051 mtu 1500
   index 44 priority 0 llprio 3
   groups: tun
   status: active
   inet6 fe80::2a15:f3af:fefb:a3b0%tun0 ->  prefixlen 64 scopeid 0x2c
   inet6 :::::a503 -> :::::18b5 prefixlen 
128

This isn't strictly necessary though, the important thing is that the route to 
the dst IPv6 endpoint is over tun0. You should be able to check if that is the 
case with "route get [dst IPv6]" and looking for tun0 in the "interface:" line. 
You could also be able to ping6 between the IPv6 tunnel endpoints too. If ping6 
isn't working, then I wouldn't expect gif traffic to work either.

Cheers,
dlg



Re: divert with rdr-to not working properly

2021-04-07 Thread David Gwynne
On Mon, Apr 05, 2021 at 09:51:53AM +0300, Hakan SARIMAN wrote:
> Hello Misc,
> 
> 
> I think divert-packet feature with NAT/NAPT is broken.
> 
> I can not reach to web server when I use divert-packet with rdr-to.
> 
> Is this a known bug or a new issue?

There's no other options? Just those two?

I think it's been around for a long time, but no one's hurt themselves
with it because they haven't combined nat/rdr with divert-packet
yet.

I believe the diff below will fix the bug. There's some discussion going
on behind the scenes about whether this is the right fix though.

> 
> When I use divert-packet + rdr-to here is the situation:
> 
> 
> # MY PF RULES
> 
> pass in log quick on pppoe0 inet proto tcp from any to (pppoe0:0) port 81
> rdr-to 10.10.12.27 port 81
> 
> pass out log quick on vport12 inet proto tcp from any to 10.10.12.27 port
> 81 divert-packet port 700

Index: pf.c
===
RCS file: /cvs/src/sys/net/pf.c,v
retrieving revision 1.1112
diff -u -p -r1.1112 pf.c
--- pf.c23 Feb 2021 11:43:40 -  1.1112
+++ pf.c5 Apr 2021 10:16:31 -
@@ -6848,8 +6848,10 @@ pf_test(sa_family_t af, int fwdir, struc
if ((*m0)->m_pkthdr.pf.flags & PF_TAG_GENERATED)
return (PF_PASS);
 
-   if ((*m0)->m_pkthdr.pf.flags & PF_TAG_DIVERTED_PACKET)
+   if ((*m0)->m_pkthdr.pf.flags & PF_TAG_DIVERTED_PACKET) {
+   CLR((*m0)->m_pkthdr.pf.flags, PF_TAG_DIVERTED_PACKET);
return (PF_PASS);
+   }
 
if ((*m0)->m_pkthdr.pf.flags & PF_TAG_REFRAGMENTED) {
(*m0)->m_pkthdr.pf.flags &= ~PF_TAG_REFRAGMENTED;



Re: What determines source IP of traffic from OpenBSD box ?

2021-02-28 Thread David Gwynne
On Sun, Feb 28, 2021 at 01:17:01PM +0100, Rachel Roch wrote:
> 
> 
> 
> 28 Feb 2021, 11:28 by s...@spacehopper.org:
> 
> > On 2021/02/28 11:46, Rachel Roch wrote:
> >
> >> Thank you all for the suggestions, I am currently testing a few of them.
> >>
> >> Incase it makes any difference, the underlying problem I have is I have 
> >> two firewalls with BGP upstreams, one acting as primary, one as standby.?? 
> >> So the problem I am seeing is the age-old problem of asymmetric traffic to 
> >> the secondary firewall meaning pkg_add on the secondary doesn't work.
> >>
> >
> > You can't just get two sessions from your upstreams so they can both be
> > active rather than one in standby?
> >
> 
> Maybe my wording is a little off.
> 
> I do have independent sessions from FW1 and FW2 to upstream routers.
> 
> The problem, I suspect, is more to do with overlapping of IP ranges being 
> advertised to upstreams, and hence traffic never making it back to FW2 
> because FW1 picks it up, hence the desire to have an effective way to tell 
> OpenBSD "send all localhost originating traffic from lo2 because the IPs on 
> lo2 are exclusive to that host".

I have a situation like that at work which I solved using the following
rules:

# let us talk to things
  match out on vlan363 to !vlan363:network !received-on any nat-to lo1
  match out on vlan364 to !vlan364:network !received-on any nat-to lo1
  pass out !received-on any

vlan363 and vlan364 are the links I use to talk to the rest of the
world.

There may be a less worse way to do that with the routing table now
though.



Re: seeing carp interface state change for unknown reason ; cluestick hunting

2021-02-01 Thread David Gwynne



> On 1 Feb 2021, at 6:02 pm, Bryan Stenson  wrote:
> 
> Hi all -
> 
> I'm trying to setup a pair of ERL3 octeon routers in master/standby
> mode via carp/pfsync to route traffic from my internal lan to the
> internet.  I've seen strange behavior wrt carp on these machines, so
> in an attempt to reduce the problem, I've removed one completely.
> 
> Even with only a single box (ERL3-01) on the network configured as a
> carp member, the carp interface state periodically changes (as seen
> from ifstated(8)).
> 
> I'm wondering if disconnecting the other ERL3 device is a valid isolated test.
> 1.  Will/might this cause issues with the carp device, as it cannot
> determine state from any other host?

If carp state flaps around while it is the only device on the network, that 
would imply the parent device is flapping around.

> 2.  Will/might this cause issues as it cannot send/receive pfsync
> updates (the other node is disconnected).

pfsync doesn't really care about carp state.

> 3.  Is there something else in my setup causing carp to fail here?

I'd be running "route monitor" and looking for link state changes on the carp 
parent interface.

> 4.  Could this be hardware/temperature related to this ERL3?  Wouldn't
> I see an additional error in dmesg if the physical device (cnmac2)
> failed periodically?
> 
> I'd appreciate any pointers here...I feel like I'm missing something dumb.

My first ideas are above. If it turns out the carp parent is stable we can try 
come up with something else.

dlg

> 
> Thanks in advance.
> 
> Bryan
> 
> Here are some of my configs.  If I've missed including something
> critical to help describe my setup, please let me know and I'll add
> it.
> 
> ## Help me OBSD-Misc Kenobi.  You're my only hope. ##
> 
> erl3-01# uname -a
> OpenBSD erl3-01.siliconvortex.com 6.8 GENERIC#522 octeon
> 
> erl3-01# dmesg
> ...
> carp1: state transition: BACKUP -> MASTER
> carp1: state transition: BACKUP -> MASTER
> carp1: state transition: BACKUP -> MASTER
> carp1: state transition: BACKUP -> MASTER
> carp1: state transition: BACKUP -> MASTER
> carp1: state transition: BACKUP -> MASTER
> 
> erl3-01# tail mbox
> Mon, 1 Feb 2021 06:49:26 + (UTC)
> From: Charlie Root 
> Date: Mon, 1 Feb 2021 06:49:25 + (UTC)
> To: root@localhost
> Subject: carp master changed
> Message-ID: <515eb74cff427...@erl3-01.siliconvortex.com>
> Status: RO
> 
> master is now erl3-01.siliconvortex.com
> 
> 
> erl3-01# sysctl -a | grep carp
> net.inet.carp.allow=1
> net.inet.carp.preempt=1
> net.inet.carp.log=2
> 
> erl3-01# cat /etc/hostname.carp1
> #carp for lan side
> 192.168.122.1/23 carpdev vlan100 vhid 1 pass somethinglongandsecret
> 
> erl3-01# cat /etc/hostname.vlan100
> vnetid 100 parent cnmac2
> up
> 
> erl3-01# cat /etc/hostname.cnmac2
> inet 192.168.1.253 255.255.254.0
> 
> erl3-01# cat /etc/hostname.pfsync0
> up syncdev cnmac1
> 
> erl3-01# cat /etc/hostname.cnmac1
> inet 10.10.200.1 255.255.255.252
> 
> erl3-01# cat /etc/ifstated.conf
> # Initial State
> init-state auto
> 
> # Macros
> if_carp_up="carp1.link.up"
> if_carp_down="!carp1.link.up"
> 
> state auto {
>  if $if_carp_up {
>set-state master
>  }
> 
>  if $if_carp_down {
>set-state backup
>  }
> }
> 
> state master {
>  init {
>run "echo master is now `hostname` | mail -s 'carp master changed'
> root@localhost"
> }
> 
>  if $if_carp_down {
>set-state backup
>  }
> }
> 
> state backup {
>  init {
>run "echo backup is now `hostname` | mail -s 'carp master changed
> root@localhost"
>  }
> 
>  if $if_carp_up {
>set-state master
>  }
> }
> 
> erl3-01# cat /etc/pf.conf
> # adopted from https://www.openbsd.org/faq/pf/example1.html
> wan_dev = cnmac0
> lan_dev = cnmac2
> carp_dev = vlan100
> pfsync_dev = cnmac1
> table  { 0.0.0.0/8 10.0.0.0/8 127.0.0.0/8 169.254.0.0/16 \
>172.16.0.0/12 192.0.0.0/24 192.0.2.0/24 224.0.0.0/3 \
>192.168.0.0/16 198.18.0.0/15 198.51.100.0/24\
>203.0.113.0/24 }
> 
> # carp
> pass quick on $lan_dev proto carp keep state (no-sync)
> 
> # pfsync
> pass quick on $pfsync_dev proto pfsync keep state (no-sync)
> 
> set block-policy drop
> set loginterface $wan_dev
> set skip on lo0
> 
> match in all scrub (no-df random-id max-mss 1440)
> 
> # redirect DNS queries to localhost
> pass in quick on { $carp_dev $lan_dev } proto { udp tcp } from any to
> any port domain rdr-to 192.168.1.253 port domain
> 
> # NAT to the world
> match out on $wan_dev inet from !($wan_dev:network) to any nat-to ($wan_dev:0)
> 
> antispoof quick for { $wan_dev }
> 
> # martians
> block in quick on $wan_dev from  to any
> block return out quick on $wan_dev from any to 
> 
> block all
> 
> # manage buffer bloat
> queue outq on $wan_dev flows 1024 bandwidth 3M max 3M qlimit 1024 default
> queue inq on $lan_dev flows 1024 bandwidth 45M max 45M qlimit 1024 default
> 
> pass out quick inet
> 
> pass in on { $carp_dev $lan_dev } inet
> 



Re: Switching from trunk(4) to aggr(4)

2020-12-15 Thread David Gwynne
On Tue, Dec 15, 2020 at 06:43:12PM -0500, Daniel Jakots wrote:
> On Tue, 15 Dec 2020 14:30:16 +1000, David Gwynne 
> wrote:
> 
> > Can you try tcpdump -p -veni em0 -D in and see if any LACP packets
> > appear to come in on the port? If not, can you remove the -p and see
> > if em0 starts to work?
> > 
> > There are two main differences between how aggr(4) and trunk(4)
> > works. The first you've already found, which is that trunk(4) uses
> > the address from one of the ports it's given, while aggr(4) generates
> > one when it's created. The second difference is that trunk(4) makes
> > member ports promisc, while aggr(4) tries to be a lot more precise
> > and takes care to program the ports properly. This means that in your
> > environment em(4) has to support changing it's MAC address to the one
> > provided by aggr(4), and it has to support joining multicast groups
> > properly, including the one that LACP packets are sent to.
> > 
> > tcpdump with -p means that it won't make the interface promiscuous.
> > If you don't see LACP packets come in while the port is promisc, that
> > means the multicast filter isn't working properly. It should start
> > working if you're running tcpdump without -p on the em(4) ports, or
> > on aggr(4) itself.
> 
> 
> Thanks for your reply!
> 
> Here's what I did (spoiler alert, I couldn't get aggr0 to work):
> 
> I switched back the hostname files, and rebooted.
> 
> During boot:
> 
> starting network
> aggr0 em0 trunkport: creating port
> aggr0 em0 mux: BEGIN (BEGIN) -> DETACHED
> aggr0 em0 rxm: BEGIN (BEGIN) -> INITIALIZE
> aggr0 em0 rxm: INITIALIZE (UCT) -> PORT_DISABLED
> aggr0 em1 trunkport: creating port
> aggr0 em1 mux: BEGIN (BEGIN) -> DETACHED
> aggr0 em1 rxm: BEGIN (BEGIN) -> INITIALIZE
> aggr0 em1 rxm: INITIALIZE (UCT) -> PORT_DISABLED
> aggr0 em2 trunkport: creating port
> aggr0 em2 mux: BEGIN (BEGIN) -> DETACHED
> aggr0 em2 rxm: BEGIN (BEGIN) -> INITIALIZE
> aggr0 em2 rxm: INITIALIZE (UCT) -> PORT_DISABLED
> vlan10: no linkaggr0 em0 rxm: PORT_DISABLED (port_enabled) ->
> EXPIRED .aggr0 em2 rxm: PORT_DISABLED (port_enabled) -> EXPIRED
> aggr0 em1 rxm: PORT_DISABLED (port_enabled) -> EXPIRED
> ..aggr0 em0 rxm: EXPIRED (current_while_timer expired) -> DEFAULTED
> aggr0 em2 rxm: EXPIRED (current_while_timer expired) -> DEFAULTED
> aggr0 em1 rxm: EXPIRED (current_while_timer expired) -> DEFAULTED
> ... sleeping
> 
> root@pancake:~# tcpdump -p -veni em0 -D in
> tcpdump: listening on em0, link-type EN10MB
> 18:04:03.996369 80:56:f2:b7:9c:09 ff:ff:ff:ff:ff:ff 8100 60: 802.1Q vid 70 
> pri 1 arp who-has 10.70.70.254 tell 10.70.70.101
> 18:04:04.016123 00:17:10:8e:44:a5 ff:ff:ff:ff:ff:ff 8100 64: 802.1Q vid 10 
> pri 1 arp who-has 24.48.69.20 tell 24.48.69.1
> 18:04:04.034874 00:17:10:8e:44:a5 ff:ff:ff:ff:ff:ff 8100 64: 802.1Q vid 10 
> pri 1 arp who-has 24.48.69.109 tell 24.48.69.1
> 
> (vlan10 is my uplink to my isp's modem), I didn't have anything but
> those arp who-has.
> 
> root@pancake:~# ifconfig aggr0 -> still no carrier
> 
> root@pancake:~# tcpdump -veni em0 -D in
> tcpdump: listening on em0, link-type EN10MB
> 18:05:11.247455 52:54:00:06:aa:01 00:0d:b9:43:9f:fc 8100 1423: 802.1Q vid 20 
> pri 1 10.10.10.44.5638 > 198.48.202.251.25826: udp 1377 (ttl 64, id 2495, len 
> 1405)
> 18:05:11.248427 52:54:00:06:aa:01 00:0d:b9:43:9f:fc 8100 1390: 802.1Q vid 20 
> pri 1 10.10.10.44.5638 > 198.48.202.251.25826: udp 1344 (ttl 64, id 47470, 
> len 1372)
> 18:05:11.249478 52:54:00:06:aa:01 00:0d:b9:43:9f:fc 8100 1424: 802.1Q vid 20 
> pri 1 10.10.10.44.5638 > 198.48.202.251.25826: udp 1378 (ttl 64, id 57431, 
> len 1406)
> 18:05:11.570690 00:17:10:8e:44:a5 ff:ff:ff:ff:ff:ff 8100 64: 802.1Q vid 10 
> pri 1 arp who-has 184.161.78.225 tell 184.161.78.1
> 18:05:11.586920 00:17:10:8e:44:a5 ff:ff:ff:ff:ff:ff 8100 64: 802.1Q vid 10 
> pri 1 arp who-has 192.222.131.28 tell 192.222.131.1
> 18:05:12.050180 00:17:10:8e:44:a5 ff:ff:ff:ff:ff:ff 8100 64: 802.1Q vid 10 
> pri 1 arp who-has 24.48.76.202 tell 24.48.76.1
> 
> nothing else than those udp packets (my collectd setup) and the
> arp who-has
> 
> root@pancake:~# ifconfig aggr0 -> still no carrier
> 
> At that point I thought "sthen asked me to try to reboot the switch,
> let's do it now" and shortly after I got in my console
> aggr0 em0 rxm: DEFAULTED (!port_enabled) -> PORT_DISABLED
> aggr0 em1 rxm: DEFAULTED (!port_enabled) -> PORT_DISABLED   
> aggr0 em2 rxm: DEFAULTED (!port_enabled) -> PORT_DISABLED
> aggr0 em2 rxm: PORT_DISABLED (port_enabled) -> EXPIRED   
> aggr0 em1 

Re: Switching from trunk(4) to aggr(4)

2020-12-14 Thread David Gwynne



> On 14 Dec 2020, at 08:40, Daniel Jakots  wrote:
> 
> On Sun, 13 Dec 2020 20:34:35 - (UTC), Stuart Henderson
>  wrote:
> 
>> On 2020-12-12, Daniel Jakots  wrote:
>>> I've been using a LACP trunk on my apu (with the three em(4)). On
>>> top of which I have some vlans. I've been doing that for years and
>>> it's working fine.  
>> 
>> I used load-balancing trunk on APU before but stopped when I came to
>> the conclusion that APU running OpenBSD wasn't going to push more
>> than 1Gbps anyway.. (I use failover way more than any type of load
>> balancing)
> 
> Yes but:
> - the three cables between the switch and the APU looks beautiful
> - I don't have to care which if is em0 and which if is em2. Just plug
>  everything.
> :)
> 
>> I don't see anything on the switch side I could change, and the log I
>> have is merely the ports going up or down when I reboot.
>> 
>>> Any idea why aggr(4) stays in no carrier status?  
>> 
>> Do you get any clues from "ifconfig aggr0 debug"?
> 
> I just tried
> # ifconfig aggr0 debug
> # dmesg
> 
> # ifconfig aggr0 down
> # ifconfig aggr0 up
> # ifconfig aggr0 # checked the debug flag was still there
> # dmesg
> 
> 
> I also looked at /var/log/message to be save, but nothing relevant.
> 
>> What does the lacp status look like on the switch? (or does it just
>> say 'up' or something and not really have any status?)
> 
> It doesn't say anything about the lacp, it just says the individual
> ports are going up or down (which is normal since I'm rebooting the apu
> to apply the network config change).

Can you try tcpdump -p -veni em0 -D in and see if any LACP packets appear to 
come in on the port? If not, can you remove the -p and see if em0 starts to 
work?

There are two main differences between how aggr(4) and trunk(4) works. The 
first you've already found, which is that trunk(4) uses the address from one of 
the ports it's given, while aggr(4) generates one when it's created. The second 
difference is that trunk(4) makes member ports promisc, while aggr(4) tries to 
be a lot more precise and takes care to program the ports properly. This means 
that in your environment em(4) has to support changing it's MAC address to the 
one provided by aggr(4), and it has to support joining multicast groups 
properly, including the one that LACP packets are sent to.

tcpdump with -p means that it won't make the interface promiscuous. If you 
don't see LACP packets come in while the port is promisc, that means the 
multicast filter isn't working properly. It should start working if you're 
running tcpdump without -p on the em(4) ports, or on aggr(4) itself.

Cheers,
dlg



Re: dhclient on carp

2020-07-23 Thread David Gwynne



> On 23 Jul 2020, at 22:28, Guy Godfroy  wrote:
> 
> Doesn't work better.
> I guess Sebastian is right, carp has to be assigned an IP to come up.

yeah, i just read the code a bit. they have to be able to communicate to be 
able to elect which one is the active and which is the backup. i suggest using 
an address like one in 169.254.x.y/16 so the carps can elect.

> 
> Le 23/07/2020 à 03:15, David Gwynne a écrit :
>>> On 22 Jul 2020, at 22:59, Guy Godfroy  wrote:
>>> 
>>> Hello,
>>> 
>>> So I read in 6.7 release note that it's finally possible to use dhclient on 
>>> CARP interface. That's great news.
>>> 
>>> However, I'm not sure how to use it on a hostname.if file. I tried to 
>>> replace inet instruction directly with dhcp:
>>> 
>>>dhcp vhid 11 carpdev em1 pass  description "test"
>>> 
>>> 
>>> But that didn't do the trick: at boot time, none of my nodes carp were in 
>>> master state so dhclient didn't manage to get any lease.
>>> 
>>> So I have first to give a static IP to my carp in order to activate it, and 
>>> only then trigger dhcp:
>>> 
>>>inet [...] vhid 11 carpdev em1 pass  description "test"
>>> 
>>>dhcp
>>> 
>>> It doesn't feel right. Is there a better way to do this?
>> hostname.if0 lines don't have to all be address configurations. generally 
>> netstart just passes the statements directly to ifconfig.
>> does something like the following work in hostname.carp0?
>> description "test"
>> vhid 11 carpdev em1 pass 
>> dhcp
>> dlg
> 



Re: dhclient on carp

2020-07-22 Thread David Gwynne



> On 22 Jul 2020, at 22:59, Guy Godfroy  wrote:
> 
> Hello,
> 
> So I read in 6.7 release note that it's finally possible to use dhclient on 
> CARP interface. That's great news.
> 
> However, I'm not sure how to use it on a hostname.if file. I tried to replace 
> inet instruction directly with dhcp:
> 
>dhcp vhid 11 carpdev em1 pass  description "test"
> 
> 
> But that didn't do the trick: at boot time, none of my nodes carp were in 
> master state so dhclient didn't manage to get any lease.
> 
> So I have first to give a static IP to my carp in order to activate it, and 
> only then trigger dhcp:
> 
>inet [...] vhid 11 carpdev em1 pass  description "test"
> 
>dhcp
> 
> It doesn't feel right. Is there a better way to do this?

hostname.if0 lines don't have to all be address configurations. generally 
netstart just passes the statements directly to ifconfig.

does something like the following work in hostname.carp0?

description "test"
vhid 11 carpdev em1 pass 
dhcp

dlg



Re: non-checksummed UDP packets

2020-07-20 Thread David Gwynne



> On 20 Jul 2020, at 05:30, Stuart Henderson  wrote:
> 
> On 2020-07-19, obs...@loopw.com  wrote:
>> 
>>> Is this normal?  
>> 
>> Checksum is OPTIONAL in UDP, not required.  This is covered in RFC 768.
> 
> For IPv4, anyway. It's required for v6.

Or is it?

https://tools.ietf.org/html/rfc6935



Re: using aggr interface instead of trunk

2020-05-19 Thread David Gwynne



> On 14 May 2020, at 4:22 pm, mabi  wrote:
> 
> Hi Iain,
> 
> ‐‐‐ Original Message ‐‐‐
> On Wednesday, May 13, 2020 7:55 PM, Iain R. Learmonth  wrote:
> 
>> More details are at:https://marc.info/?l=openbsd-cvs=156229058006706=2
> 
> I actually already read that one after seeing the announcement on 
> undeadly.org iirc ;)
> 
>> Assuming you mean trunk, not tun, yes.
> 
> Right, thanks for spotting that, I meant trunk of course.
> 
>> I don't see mention of any aggr fixes in the 6.7 changelog, so I guess it 
>> didn't have any disasters in it. Others are using it on production systems.
> 
> Nice to hear that, I will give it a shot as soon as I upgrade to 6.6 my HA 
> CARP cluster of two OpenBSD firewalls. I might first try using it on one of 
> the two firewalls so that I can easily switch to the other firewall in any 
> case of issue.

I would wait for 6.7 before using aggr(4) in production. Considering 6.7 is out 
now, there's no reason not to use it instead of 6.6.

dlg



Re: small aggr problem ( on current )

2019-12-22 Thread David Gwynne
On Thu, Dec 19, 2019 at 01:59:30PM +0100, Hrvoje Popovski wrote:
> On 15.12.2019. 23:01, Hrvoje Popovski wrote:
> > On 15.12.2019. 12:45, Holger Glaess wrote:
> >> hi
> >>
> >>
> >> ?? runing version
> >>
> >>
> >> /etc 16>dmesg | more
> >> Copyright (c) 1982, 1986, 1989, 1991, 1993
> >> ?? The Regents of the University of California.?? All rights 
> >> reserved.
> >> Copyright (c) 1995-2019 OpenBSD. All rights reserved.
> >> https://www.OpenBSD.org
> >>
> >> OpenBSD 6.6-current (GENERIC.MP) #48: Tue Dec 10 16:30:01 MST 2019
> >> dera...@octeon.openbsd.org:/usr/src/sys/arch/octeon/compile/GENERIC.MP
> >>
> >>
> >>
> >> after a reboot the aggr interface do not aggregate the connection with
> >> the switch,
> >>
> >> just after an physical disaconnection from the ethernet cable , wait for
> >> some sec,
> >>
> >> and replugin .
> >>
> >>
> >> the the iterface are up and active, before ifconfig says "no carrier"
> >> but the interfaces have
> >>
> >> carrier.
> >>
> >> i dont have the problem with the trunk interface on the same hardware.
> >>
> >>
> >> you are on bellab as root
> >> /etc 20>cat /etc/hostname.cnmac1
> >> mtu 1518
> >> up
> >>
> >> 12:43:59 Sun Dec 15
> >> you are on bellab as root
> >> /etc 21>cat /etc/hostname.cnmac2
> >> mtu 1518
> >> up
> >>
> >> 12:44:01 Sun Dec 15
> >> you are on bellab as root
> >> /etc 22>cat /etc/hostname.aggr0
> >> trunkport cnmac1
> >> trunkport cnmac2
> >> mtu 1518
> >> up
> >>
> >>
> >> holger
> >>
> >>
> >>
> > Hi,
> > 
> > maybe logs below would help for further troubleshooting because i'm
> > seeing same behavior.
> > 
> > when i add debug statement in hostname.agg0 and boot box i'm getting
> > this log
> > 
> > starting network
> > aggr0 ix0 rxm: LACP_DISABLED (LACP_Enabled) -> PORT_DISABLED
> > aggr0 ix0: selection logic: unselected (rxm !CURRENT)
> > aggr0 ix1 rxm: LACP_DISABLED (LACP_Enabled) -> PORT_DISABLED
> > aggr0 ix1: selection logic: unselected (rxm !CURRENT)
> > aggr0 ix2 rxm: LACP_DISABLED (LACP_Enabled) -> PORT_DISABLED
> > aggr0 ix2: selection logic: unselected (rxm !CURRENT)
> > aggr0 ix3 rxm: LACP_DISABLED (LACP_Enabled) -> PORT_DISABLED
> > aggr0 ix3: selection logic: unselected (rxm !CURRENT)
> > reordering libraries: done.
> > 
> > after boot aggr status is "no carrier"
> > sh /etc/netstart isn't helping
> > 
> > but with ifconfig ix0-ix4 down/up aggr interface start to work normally
> > 
> > log when doing ifconfig ix0-ix4 down/up
> 
> 
> just a little follow up:
> 
> i've tested aggr on two boxes. first box is dell r620 and second one is
> supermicro SYS-5018D-FN8T. both boxes are connected to dell s4810
> switch. Same cables, same ports, same port-channles on switch, timeout
> fast or slow, both with ix 82599 interfaces ... (x552 ix interfaces are
> disabled on supermicro box) ...
> 
> r620 is working without any problems and supermicro box is having same
> problem as described above...
> 
> trunk interface are working on both boxes without any problem ..
> 
> 
> this is fun :)

:/

can you try this diff?

Index: if_aggr.c
===
RCS file: /cvs/src/sys/net/if_aggr.c,v
retrieving revision 1.19
diff -u -p -r1.19 if_aggr.c
--- if_aggr.c   5 Aug 2019 10:42:51 -   1.19
+++ if_aggr.c   23 Dec 2019 04:50:30 -
@@ -2401,8 +2401,7 @@ aggr_up(struct aggr_softc *sc)
 
TAILQ_FOREACH(p, >sc_ports, p_entry) {
aggr_rxm(sc, p, LACP_RXM_E_LACP_ENABLED);
-
-   aggr_selection_logic(sc, p);
+   aggr_p_linkch(p);
}
 
/* start the Periodic Transmission machine */



Re: ipv6 via he.net connectivity issues - possible regression?

2019-12-13 Thread David Gwynne
aggr(4) didn't exist in OpenBSD 6.6, so maybe that's the difference. Does the 
problem go away if you use trunk(4) instead of aggr(4)? Alternatively, could 
you build a -current kernel and make sure you have src/sys/net/if_aggr.c r1.25 
and see what effect that has?

Cheers,
dlg

> On 13 Dec 2019, at 8:06 am, Pedro Caetano  
> wrote:
> 
> Hi misc,
> 
> I'm running amd64 -current, snapshot #518.
> 
> My router has 4 em(4) interfaces.
> em0 provides ipv4 internet via vlan100 which is connected to ISP ont.
> em1, em2, em3 are bonded using aggr(4) to a lacp capable switch.
> 
> A /48 subnet is routed via gif(4) tunnel to he.net, then subnetted into
> /64s.
> 
> Three vlans exist on top of the aggr(4) device.
> Ipv4 addresses are assigned by dhcpd(8), ipv6 addresses are assigned by
> rad(8).
> 
> Hosts can acquire ip via rad(8), but are unable to access the internet
> unless the gateway is pinged.
> Hosts are also unreachable from the internet.
> 
> Unfortunately I cannot tell precisely when this behavior started, but I
> guess this was not an issue on 6.5.
> 
> Please let me know if any more information is needed.
> 
> Best regards,
> Pedro Caetano



Re: issues configuring vlan on top of aggr device

2019-12-05 Thread David Gwynne
On Tue, Dec 03, 2019 at 02:11:16PM +, Pedro Caetano wrote:
> Hi again,
> 
> I'm sorry, but since the boxes do not (yet) have working networking it is
> not easy for me to get the text output.
> I'm attaching a few pictures with the requested output.
> 
> https://picpaste.me/images/2019/12/03/cat_hostname.vl3800_hostname.aggr0.jpg
> https://picpaste.me/images/2019/12/03/ifconfig_vl3800.jpg
> https://picpaste.me/images/2019/12/03/ifconfig_aggr0.jpg
> 
> 
> Best regards,
> Pedro Caetano
> 
> On Tue, Dec 3, 2019 at 12:35 PM Hrvoje Popovski  wrote:
> 
> > On 3.12.2019. 13:15, Pedro Caetano wrote:
> > > Hi Hrvoje, thank you for the fast reply,
> > >
> > > Unfortunately I have the same behavior.
> > > The aggr0 works as expected, as I can see the links bonded on the switch.
> > > I'm able to se the correct vid s, when tcpdump'ing the aggr0 interface.
> > >
> > > I'd appreciate any help on this topic.
> > >
> >
> > can you send ifconfig aggr0 and ifconfig vlan3800 ?
> >
> >
> >
> >
> > > This configuration is working on -current with em(4) nics.
> > >
> > >
> > > Best regards,
> > > Pedro Caetano
> > >
> > > A ter??a, 3/12/2019, 12:01, Hrvoje Popovski  > > > escreveu:
> > >
> > > On 3.12.2019. 12:21, Pedro Caetano wrote:
> > > > Hi misc@
> > > >
> > > > I'm running openbsd 6.6 with latest patches running on a pair of
> > > hp dl 360
> > > > gen6 servers.
> > > >
> > > > I'm attempting to configure an aggr0 device towards a cat 3650.
> > > >
> > > > The aggr0 associates successfully with the switch, but I'm unable
> > > to run
> > > > vlans on top of it.
> > > >
> > > > The configuration on openbsd is the following:
> > > > #ifconfig aggr0 create
> > > > #ifconfig aggr0 trunkport bnx0
> > > > #ifconfig aggr0 trunkport bnx1
> > >
> > > add this - ifconfig aggr0 up
> > > if you have hostname.aggr0 add "up" at the end of that file ...
> > >
> > > > #ifconfig vlan3800 create
> > > > #ifconfig vlan3800 vnetid 3800
> > > > #ifconfig vlan3800 parent aggr0
> > > > #ifconfig vlan3800 10.80.253.10/24 
> > > > ifconfig: SIOCAIFADDR: No buffer space available.

hey,

hrvoje gave me a heads up about this, and i came up with some diffs that
which seem to help according to his testing.

the most useful for you using aggr is this diff for bnx which enables
the use of jumbos. it's pretty mechanical, except that it stops
advertising the VLAN_MTU capability. instead it advertises what the
actual hardmtu is, which allows the extra 4 bytes to be used by any
protocol, not just vlan(4).

aggr(4) does not (currently) pass the VLAN_MTU capability from it's
ports through for vlan(4) to use, but passing the larger hardmtu through
has the same effect.

unless anyone objects, im going to commit this tomorrow.

fyi, ifconfig foo0 hwfeatures is how you see the capabilities and
hardmtu settings.

Index: if_bnx.c
===
RCS file: /cvs/src/sys/dev/pci/if_bnx.c,v
retrieving revision 1.125
diff -u -p -r1.125 if_bnx.c
--- if_bnx.c10 Mar 2018 10:51:46 -  1.125
+++ if_bnx.c5 Dec 2019 09:52:04 -
@@ -875,12 +875,13 @@ bnx_attachhook(struct device *self)
ifp->if_ioctl = bnx_ioctl;
ifp->if_qstart = bnx_start;
ifp->if_watchdog = bnx_watchdog;
+   ifp->if_hardmtu = BNX_MAX_JUMBO_ETHER_MTU_VLAN -
+   sizeof(struct ether_header);
IFQ_SET_MAXLEN(>if_snd, USABLE_TX_BD - 1);
bcopy(sc->eaddr, sc->arpcom.ac_enaddr, ETHER_ADDR_LEN);
bcopy(sc->bnx_dev.dv_xname, ifp->if_xname, IFNAMSIZ);
 
-   ifp->if_capabilities = IFCAP_VLAN_MTU | IFCAP_CSUM_TCPv4 |
-   IFCAP_CSUM_UDPv4;
+   ifp->if_capabilities = IFCAP_CSUM_TCPv4 | IFCAP_CSUM_UDPv4;
 
 #if NVLAN > 0
ifp->if_capabilities |= IFCAP_VLAN_HWTAGGING;
@@ -2417,7 +2418,7 @@ bnx_dma_alloc(struct bnx_softc *sc)
 */
for (i = 0; i < TOTAL_TX_BD; i++) {
if (bus_dmamap_create(sc->bnx_dmatag,
-   MCLBYTES * BNX_MAX_SEGMENTS, BNX_MAX_SEGMENTS,
+   BNX_MAX_JUMBO_ETHER_MTU_VLAN, BNX_MAX_SEGMENTS,
MCLBYTES, 0, BUS_DMA_NOWAIT, >tx_mbuf_map[i])) {
printf(": Could not create Tx mbuf %d DMA map!\n", 1);
rc = ENOMEM;
@@ -2650,8 +2651,8 @@ bnx_dma_alloc(struct bnx_softc *sc)
 * Create DMA maps for the Rx buffer mbufs.
 */
for (i = 0; i < TOTAL_RX_BD; i++) {
-   if (bus_dmamap_create(sc->bnx_dmatag, BNX_MAX_MRU,
-   BNX_MAX_SEGMENTS, BNX_MAX_MRU, 0, BUS_DMA_NOWAIT,
+   if (bus_dmamap_create(sc->bnx_dmatag, BNX_MAX_JUMBO_MRU,
+   1, BNX_MAX_JUMBO_MRU, 0, BUS_DMA_NOWAIT,
>rx_mbuf_map[i])) {
printf(": Could not create Rx mbuf %d DMA map!\n", i);
rc = ENOMEM;
@@ 

Re: Changes to VLAN and promiscuous mode in 6.6

2019-11-03 Thread David Gwynne
Hey,

This should be fixed in current as of r1.199 of src/sys/net/if_vlan.c

Sorry for the inconvenience.

Cheers,
dlg

> On 29 Oct 2019, at 19:49, Zé Loff  wrote:
> 
> 
> Hi all
> 
> Some changes in VLAN-related code went into 6.6 and I think some of them
> changed the way the parent interface gets into promiscuous mode.  Let me
> try to explain...
> 
> Our ISP provides internet and VoIP over two separate VLANs (100 and 101,
> respectively).  Our external firewall has two physical interfaces re0,
> and re1, and also does the filtering and NATing for internet, but VoIP
> traffic is transparently forwarded to the VoIP phone.  So it's something
> like this:
> 
> GPON -> re0 -+--> vlan100  -> (PF/NAT) -> vlan90   -+-> re1 -> A switch
>  \-> vlan1010 -> bridge1  -> vlan1011 -/
> 
> The VoIP phone connected to the switch, which does all the appropriate
> tagging and untagging.  re0 and re1 have no IP addresses, neither do the
> vlan1010, vlan1011 and bridge1 virtual interfaces.  The VoIP phone gets
> configured by DHCP, and gets its address (and etc) from the ISP.  All
> interfaces are up, and correctly configured (ifconfigs below).  This
> worked fine up until the 6.6 upgrade.
> 
> Now, if things are left alone, the phone fails to get DHCP replies.
> This can be checked by running "tcpdump -i re1 vlan 101", which clearly
> shows the DHCP requests coming from the phone, but getting no replies.
> Exactly the same is seen on vlan1011 and vlan1010 (i.e. on both sides of
> the bridge1): DHCP requests but no replies.  If tcpdump is run on re0
> ("tcpdump -i re0 vlan 101") then the interface goes into promiscuous
> mode and the DHCP replies start flowing from the ISP and the phone
> finally gets configured.  Crucially, if the "-p" flag is added to
> tcpdump (i.e. not putting the if in promiscuous mode), DHCP fails.
> 
> Is this behaviour intended and, if so, can re0 be configured to stay in
> promiscuous mode without having to do something silly as tcpdump'ing
> into /dev/null?
> 
> Thanks in advance
> Zé
> 
> -- 
> 
> # ifconfig -A
> lo0: flags=8049 mtu 32768
>index 5 priority 0 llprio 3
>groups: lo
>inet6 ::1 prefixlen 128
>inet6 fe80::1%lo0 prefixlen 64 scopeid 0x5
>inet 127.0.0.1 netmask 0xff00
> re0: flags=8b43 mtu 
> 1500
>lladdr 00:0d:b9:3c:b0:e8
>index 1 priority 0 llprio 3
>media: Ethernet autoselect (1000baseT full-duplex,master)
>status: active
> re1: flags=8843 mtu 9100
>lladdr 00:0d:b9:3c:b0:e9
>index 2 priority 0 llprio 3
>media: Ethernet autoselect (1000baseT full-duplex,rxpause,txpause)
>status: active
> re2: flags=8802 mtu 1500
>lladdr 00:0d:b9:3c:b0:ea
>index 3 priority 0 llprio 3
>media: Ethernet autoselect (10baseT half-duplex)
>status: no carrier
> enc0: flags=0<>
>index 4 priority 0 llprio 3
>groups: enc
>status: active
> bridge1: flags=41
>index 6 llprio 3
>groups: bridge
>priority 32768 hellotime 2 fwddelay 15 maxage 20 holdcnt 6 proto rstp
>vlan1011 flags=3
>port 11 ifpriority 0 ifcost 0
>vlan1010 flags=3
>port 10 ifpriority 0 ifcost 0
>Addresses (max cache: 100, timeout: 240):
>00:00:5e:00:01:c9 vlan1010 1 flags=0<>
>80:5e:c0:12:3f:80 vlan1011 1 flags=0<>
> vlan100: flags=808843 mtu 
> 1500
>lladdr 00:0d:b9:3c:b0:e8
>description: WAN
>index 9 priority 0 llprio 3
>encap: vnetid 100 parent re0 txprio packet rxprio outer
>groups: vlan egress
>media: Ethernet autoselect (1000baseT full-duplex,master)
>status: active
>inet 148.69.164.57 netmask 0xfc00 broadcast 148.69.167.255
>inet 148.69.143.1 netmask 0xfffc broadcast 148.69.143.3
> vlan1010: flags=8943 mtu 1500
>lladdr 00:0d:b9:3c:b0:e8
>description: VoIP WAN
>index 10 priority 0 llprio 3
>encap: vnetid 101 parent re0 txprio packet rxprio outer
>groups: vlan
>media: Ethernet autoselect (1000baseT full-duplex,master)
>status: active
> vlan1011: flags=8943 mtu 1500
>lladdr 00:0d:b9:3c:b0:e9
>description: VoIP DMZ
>index 11 priority 0 llprio 3
>encap: vnetid 101 parent re1 txprio packet rxprio outer
>groups: vlan
>media: Ethernet autoselect (1000baseT full-duplex,rxpause,txpause)
>status: active
> vlan90: flags=8843 mtu 9000
>lladdr 00:0d:b9:3c:b0:e9
>description: DMZ
>index 14 priority 0 llprio 3
>encap: vnetid 90 parent re1 txprio packet rxprio outer
>groups: vlan
>media: Ethernet autoselect (1000baseT full-duplex,rxpause,txpause)
>status: active
>inet 10.17.16.1 netmask 0xfe00 broadcast 10.17.17.255
> pflog0: flags=141 mtu 33136
>index 15 priority 0 llprio 3
>groups: pflogDear sirs
> 
> 
> 

Re: 6.6-beta (RAMDISK_CD) #281 hangs on fsck

2019-09-09 Thread David Gwynne
This should be fixed in -current now. A snapshot should pick it up in a day or 
so. Sorry for the inconvenience.

Cheers,
dlg

> On 9 Sep 2019, at 11:08 am, Luke Small  wrote:
> 
> Yay!
> -Luke
> 
> 
> On Sun, Sep 8, 2019 at 8:07 PM David Gwynne  wrote:
> I think I see the problem. We're going to try and test this locally and will 
> hopefully have something committed in a few hours time.
> 
> dlg
> 
> > On 9 Sep 2019, at 10:33, Luke Small  wrote:
> > 
> > I have mfii too:
> > dmesg | grep mfii:
> > 
> > mfii0 at pci11 dev 0 function 0 "Symbios Logic MegaRAID SAS2208" rev 0x05:
> > msi
> > mfii0: "LSI MegaRAID SAS 9271-8i", firmware 23.28.0-0010, 1024MB cache
> > scsibus1 at mfii0: 64 targets
> > scsibus2 at mfii0: 256 targets
> > 
> >> On 8.9.2019. 18:19, Luke Small wrote:
> >>> It doesn't work for me on the
> >>> ftp.hostserver.de/archive/2019-08-29-0105/amd64/
> >>> bsd.rd!
> >> 
> >> 
> >> Hi,
> >> 
> >> do you maybe have mfii on that box ?
> >> 
> >> I'm having same problem as Mischa and i have mfii. with bsd.rd fsck
> >> stops with this command
> >> 
> >> Which disk is the root disk? ('?' for details) [sd0] sd0
> >> Checking root filesystem (fsck -fp /dev/sd0a)...
> >> 
> >> On other boxes without mfii bsd.rd and sysupgrade works just fine..
> >> 
> >> between 27.08 and 29.8 i saw this commit
> >> 
> >> Changes by:  d...@cvs.openbsd.org 2019/08/27 22:55:51
> >> 
> >> Modified files:
> >>  sys/dev/pci: mfii.c
> >> 
> >> Log message:
> >> implement a DV_POWERDOWN handler to flush cache and shutdown the controller
> >> 
> >> this has been in snaps for the last week without issue, and has
> >> been running in production on a bunch of my boxes for a week before
> >> that, also without issue.
> >> 
> >> 
> >> 
> 



Re: 6.6-beta (RAMDISK_CD) #281 hangs on fsck

2019-09-08 Thread David Gwynne
I think I see the problem. We're going to try and test this locally and will 
hopefully have something committed in a few hours time.

dlg

> On 9 Sep 2019, at 10:33, Luke Small  wrote:
> 
> I have mfii too:
> dmesg | grep mfii:
> 
> mfii0 at pci11 dev 0 function 0 "Symbios Logic MegaRAID SAS2208" rev 0x05:
> msi
> mfii0: "LSI MegaRAID SAS 9271-8i", firmware 23.28.0-0010, 1024MB cache
> scsibus1 at mfii0: 64 targets
> scsibus2 at mfii0: 256 targets
> 
>> On 8.9.2019. 18:19, Luke Small wrote:
>>> It doesn't work for me on the
>>> ftp.hostserver.de/archive/2019-08-29-0105/amd64/
>>> bsd.rd!
>> 
>> 
>> Hi,
>> 
>> do you maybe have mfii on that box ?
>> 
>> I'm having same problem as Mischa and i have mfii. with bsd.rd fsck
>> stops with this command
>> 
>> Which disk is the root disk? ('?' for details) [sd0] sd0
>> Checking root filesystem (fsck -fp /dev/sd0a)...
>> 
>> On other boxes without mfii bsd.rd and sysupgrade works just fine..
>> 
>> between 27.08 and 29.8 i saw this commit
>> 
>> Changes by:  d...@cvs.openbsd.org2019/08/27 22:55:51
>> 
>> Modified files:
>>  sys/dev/pci: mfii.c
>> 
>> Log message:
>> implement a DV_POWERDOWN handler to flush cache and shutdown the controller
>> 
>> this has been in snaps for the last week without issue, and has
>> been running in production on a bunch of my boxes for a week before
>> that, also without issue.
>> 
>> 
>> 



Re: Controlling OSPFD based on HAProxy state

2019-04-24 Thread David Gwynne
I've used relayd to insert routes to a service based on a health check, and
then had ospfd advertise those routes.  That might be good enough for you.

On Fri., 19 Apr. 2019, 00:40 Henry Bonath,  wrote:

> Does anyone suggest any clever way of controlling OSPFD based on the
> status of an HAProxy process?
>
> I like to use OSPFD to advertise /32 loopback IPs which HAProxy binds
> to for anycasted highly-available Reverse Proxy/Load Balancer
> services.
>
> This works great if the whole box goes down, as OSPF would no longer
> be advertising from that site, but if the HAProxy process fails for
> some reason, then it just goes down as the IP will stay in the OSPF
> table.
>
> I know there are tools like monit or supervisord which may help with
> this, but I wanted to see if anyone here may have any ideas on how to
> achieve this that I may be overlooking.
>
> Thanks!
> -Henry
>
>


Re: Viewing SFP diagnostic data in OpenBSD ?

2019-04-07 Thread David Gwynne



> On 6 Apr 2019, at 01:54, Rachel Roch  wrote:
> 
> 
> 
> 
> Apr 2, 2019, 11:19 PM by da...@gwynne.id.au:
> 
>> 
>> 
>>> On 3 Apr 2019, at 04:52, Stuart Henderson <>> s...@spacehopper.org 
>>> >> > wrote:
>>> 
>>> On 2019-04-02, Rachel Roch <>> rr...@tutanota.de 
>>> >> > wrote:
>>> 
 Hi,
 
 Hopefully I'm just searching the man pages wrong but I can't seem to find 
 any hints as to how I can view SFP diagnostics in OpenBSD (i.e. light 
 power etc.)
 
 Perhaps someone could kindly point me in the right direction ?
 
 Rachel
 
>>> 
>>> I don't think that code has been written yet.
>>> 
>> 
>> You're right, it hasn't.
>> 
>> Rachel, which nic are you interested in having this on?
>> 
>> dlg
>> 
> 
> Just spotted this email.
> 
> An Intel I350 based NIC made by HotLava  
> (https://hotlavasystems.com/products_gbe.html) 
> 

OK. I made a start on this. Have a look for "sfp module info and diagnostics" 
on tech@, or click on https://marc.info/?l=openbsd-tech=155469738013008=2

We don't have an em(4) here with optics, but a diff doesn't look too bad if 
you're willing to test it.

dlg



Re: Viewing SFP diagnostic data in OpenBSD ?

2019-04-04 Thread David Gwynne
you have em(4) with sfp?

> On 4 Apr 2019, at 18:55, Marco Prause  wrote:
> 
> I second that +1 for ix, but em would also be nice ;-)
> 
> 
> On 03.04.19 00:40, Tom Smyth wrote:
>> +1 for me also :)  ix :)
>> 
>> On Tue, 2 Apr 2019 at 23:38, Stuart Henderson  wrote:
>> 
>>>  :-)
>>> 
> 



Re: Trouble forwarding between mpw's in bridge (6.4)

2019-04-02 Thread David Gwynne
Thanks to Mitchell for figuring this out.

> On 3 Apr 2019, at 05:25, Lee Nelson  wrote:
> 
> Since Mitchell's last email, this appeared from CVS in the place where
> the patch was supposed to be applied:
> 
> CLR(m0->m_flags, M_BCAST|M_MCAST);
> 
> I skipped the patch and compiled the kernel with the source as I found
> it from CVS.  With this new kernel everything works as I expected. arp
> broadcast requests coming into the bridge on one mpw are being seen by
> the router on the other mpw and arp replies are getting back to the
> requesting router.
> 
> Thank you to everyone!!!
> 
> On Tue, Apr 2, 2019 at 4:52 AM Mitchell Krome  wrote:
>> 
>> 
>> 
>> On 2/04/2019 7:57 pm, Mitchell Krome wrote:
>>> 
>>> 
>>> On 2/04/2019 7:24 pm, David Gwynne wrote:
>>>> 
>>>> 
>>>>> On 2 Apr 2019, at 6:41 pm, Mitchell Krome  wrote:
>>>>> 
>>>>> On 2/04/2019 2:08 pm, David Gwynne wrote:
>>>>>> Can you send me the hostname.* files and the output of ifconfig (showing 
>>>>>> all interfaces)?
>>>>>> 
>>>>>> You're using -current now, right?
>>>>>> 
>>>>>> dlg
>>>>>> 
>>>>>>> On 2 Apr 2019, at 08:15, lnel...@nelnet.org wrote:
>>>>>>> 
>>>>>>> 
>>>>>>> First of all the protected domain seems to do the opposite of what I
>>>>>>> need, but it may only appear to be the case because of the strageness
>>>>>>> with broadcast.  When trying to ping (or send any traffic) between
>>>>>>> rtr01 and rtr02 and the two mpw2's are in the same protected domain,
>>>>>>> the arp requests die in the bridge.  The arp never shows up at all on
>>>>>>> the other mpw. If I remove the mpw's from the protected domain, then
>>>>>>> the arp traffic gets through to the other mpw, but it doesn't get sent
>>>>>>> out properly by MPLS.  It's sent out as MPLS broadcast traffic
>>>>>>> originating on the physical ethernet interface but with the right label
>>>>>>> for the pseudowire. Even though the arp request itself is broadcast
>>>>>>> traffic, I would expect it to be encapsulated in a unicast MPLS packet
>>>>>>> which is sent from the MAC of the bridge or the originating router and
>>>>>>> and sent as unicast to the destination router with the pseudowire's
>>>>>>> label.  As it is now, even if the destination router could figure out
>>>>>>> what to do with these MPLS broadcast packets, it would respond to the
>>>>>>> physical interface and not the bridge.
>>>>> 
>>>>> You only need the protected domain if you do a full mesh vpls (I.E.
>>>>> every router has a mpw to every other router). That wasn't the config
>>>>> you showed initially so I don't think you need it in your case.
>>>>> 
>>>>> I am running the following diff to get MPLS to work with GRE as I had a
>>>>> similar ARP issue that was caused by gre_input tagging the packets as
>>>>> MCAST and then mpls_input dropping them. When I looked into it I didn't
>>>>> think that should cause the issue I was seeing for a real interface as
>>>>> ether_input didn't re-add the MCAST flag, but I also don't have a real
>>>>> box to test on. You can give it a go and see if it helps.
>>>> 
>>>> I think you've found the problem. mpls_output replaces if_output though, 
>>>> so for interfaces with mpls enabled on this, this change causes 
>>>> BCAST|MCAST to be cleared for all outgoing packets. ie, it might break 
>>>> things like ipv6 nd on ethernet interfaces.
>>> 
>>> Yeah I had no idea what the impact of that change was, it seemed like a
>>> hack when I wrote it...
>>> 
>>>> 
>>>> What are you running on top of GRE that hit this?
>>> 
>>> I have a vpls over GRE. And I had some weird behaviour where arp was
>>> being dropped only on paths that skipped the outer MPLS label. I.E.
>>> we're directly connected to the next-hop and implicit null means we
>>> never add the LSP label, only the service label. Thanks to tcpdump not
>>> knowing about multicast MPLS over GRE and printing weirdness I worked
>>> out what was going on and tracked it down to this.
>&

Re: Viewing SFP diagnostic data in OpenBSD ?

2019-04-02 Thread David Gwynne



> On 3 Apr 2019, at 04:52, Stuart Henderson  wrote:
> 
> On 2019-04-02, Rachel Roch  wrote:
>> Hi,
>> 
>> Hopefully I'm just searching the man pages wrong but I can't seem to find 
>> any hints as to how I can view SFP diagnostics in OpenBSD (i.e. light power 
>> etc.)
>> 
>> Perhaps someone could kindly point me in the right direction ?
>> 
>> Rachel
>> 
>> 
> 
> I don't think that code has been written yet.

You're right, it hasn't.

Rachel, which nic are you interested in having this on?

dlg



Re: Trouble forwarding between mpw's in bridge (6.4)

2019-04-02 Thread David Gwynne



> On 2 Apr 2019, at 6:41 pm, Mitchell Krome  wrote:
> 
> On 2/04/2019 2:08 pm, David Gwynne wrote:
>> Can you send me the hostname.* files and the output of ifconfig (showing all 
>> interfaces)?
>> 
>> You're using -current now, right?
>> 
>> dlg
>> 
>>> On 2 Apr 2019, at 08:15, lnel...@nelnet.org wrote:
>>> 
>>> 
>>> First of all the protected domain seems to do the opposite of what I
>>> need, but it may only appear to be the case because of the strageness
>>> with broadcast.  When trying to ping (or send any traffic) between
>>> rtr01 and rtr02 and the two mpw2's are in the same protected domain,
>>> the arp requests die in the bridge.  The arp never shows up at all on
>>> the other mpw. If I remove the mpw's from the protected domain, then
>>> the arp traffic gets through to the other mpw, but it doesn't get sent
>>> out properly by MPLS.  It's sent out as MPLS broadcast traffic
>>> originating on the physical ethernet interface but with the right label
>>> for the pseudowire. Even though the arp request itself is broadcast
>>> traffic, I would expect it to be encapsulated in a unicast MPLS packet
>>> which is sent from the MAC of the bridge or the originating router and
>>> and sent as unicast to the destination router with the pseudowire's
>>> label.  As it is now, even if the destination router could figure out
>>> what to do with these MPLS broadcast packets, it would respond to the
>>> physical interface and not the bridge.
> 
> You only need the protected domain if you do a full mesh vpls (I.E.
> every router has a mpw to every other router). That wasn't the config
> you showed initially so I don't think you need it in your case.
> 
> I am running the following diff to get MPLS to work with GRE as I had a
> similar ARP issue that was caused by gre_input tagging the packets as
> MCAST and then mpls_input dropping them. When I looked into it I didn't
> think that should cause the issue I was seeing for a real interface as
> ether_input didn't re-add the MCAST flag, but I also don't have a real
> box to test on. You can give it a go and see if it helps.

I think you've found the problem. mpls_output replaces if_output though, so for 
interfaces with mpls enabled on this, this change causes BCAST|MCAST to be 
cleared for all outgoing packets. ie, it might break things like ipv6 nd on 
ethernet interfaces.

What are you running on top of GRE that hit this?

For now it might be better to have mpw etc clear the flags before calling 
mpls_output.

Cheers,
dlg

> 
> 
> diff --git sys/netmpls/mpls_output.c sys/netmpls/mpls_output.c
> index b2be1fcc9..fe6e0ec42 100644
> --- sys/netmpls/mpls_output.c
> +++ sys/netmpls/mpls_output.c
> @@ -53,6 +53,9 @@ mpls_output(struct ifnet *ifp, struct mbuf *m, struct
> sockaddr *dst,
>   int  error;
>   u_int8_t ttl;
> 
> + /* reset broadcast and multicast flags, this is a P2P tunnel */
> + m->m_flags &= ~(M_BCAST | M_MCAST);
> +
>   if (rt == NULL || (dst->sa_family != AF_INET &&
>   dst->sa_family != AF_INET6 && dst->sa_family != AF_MPLS)) {
>   if (!ISSET(ifp->if_xflags, IFXF_MPLS))
> @@ -132,9 +135,6 @@ mpls_output(struct ifnet *ifp, struct mbuf *m,
> struct sockaddr *dst,
>   goto bad;
>   }
> 
> - /* reset broadcast and multicast flags, this is a P2P tunnel */
> - m->m_flags &= ~(M_BCAST | M_MCAST);
> -
>   smpls->smpls_label = shim->shim_label & MPLS_LABEL_MASK;
>   error = ifp->if_ll_output(ifp, m, smplstosa(smpls), rt);
>   return (error);



Re: Trouble forwarding between mpw's in bridge (6.4)

2019-04-01 Thread David Gwynne
Can you send me the hostname.* files and the output of ifconfig (showing all 
interfaces)?

You're using -current now, right?

dlg

> On 2 Apr 2019, at 08:15, lnel...@nelnet.org wrote:
> 
> 
>> Until recently
>> (https://github.com/openbsd/src/commit/dc68b945bbc883db108ac48a07bb89
>> 778b75582a)
>> bridge did split horizon detection by not allowing you to send
>> between
>> two mpw interfaces. In the case of a single VPLS this is the correct
>> thing, but more generally it isn't quite right. Particularly when you
>> want to bridge two seperate VPLS's. It's been removed now, and to
>> achieve proper VPLS functionality with the change applied I found I
>> had
>> to add all mpw interfaces in the same VPLS to the same protected
>> domain.
>> 
>> If you update to current your config will probably work, but be
>> mindful
>> that for a full mesh VPLS if you don't put them in a protected domain
>> you'll probably get a full mesh of broadcasts.
> 
> Thanks.  Your advice on upgrading the OS along with a hack of my own
> got me to a working state, but it isn't a sustainable or stable state.
> I installed the March 31 snapshot and the split-horizon problem was
> resolved.  However, there is still a problem with arp (and probably all
> broadcast traffic, but I never get past arp).  If I create a static arp
> for rtr01 on rtr02 and rtr02 on rtr01, then everything else works. I
> can send traffic back and forth between routers over the pseudowires.
> This is a hack that works for now, but it's not really a solution.
> 
> First of all the protected domain seems to do the opposite of what I
> need, but it may only appear to be the case because of the strageness
> with broadcast.  When trying to ping (or send any traffic) between
> rtr01 and rtr02 and the two mpw2's are in the same protected domain,
> the arp requests die in the bridge.  The arp never shows up at all on
> the other mpw. If I remove the mpw's from the protected domain, then
> the arp traffic gets through to the other mpw, but it doesn't get sent
> out properly by MPLS.  It's sent out as MPLS broadcast traffic
> originating on the physical ethernet interface but with the right label
> for the pseudowire. Even though the arp request itself is broadcast
> traffic, I would expect it to be encapsulated in a unicast MPLS packet
> which is sent from the MAC of the bridge or the originating router and
> and sent as unicast to the destination router with the pseudowire's
> label.  As it is now, even if the destination router could figure out
> what to do with these MPLS broadcast packets, it would respond to the
> physical interface and not the bridge.
> 
> Without the protected domain, this is what I see on both mpw
> interfaces:
>   11   4.015737 02:3b:c0:60:4c:95 ? ff:ff:ff:ff:ff:ff ARP 42 Who has
> 192.168.99.2? Tell 192.168.99.3
>12   4.015751 02:3b:c0:60:4c:95 ? ff:ff:ff:ff:ff:ff ARP 42 Who has
> 192.168.99.2? Tell 192.168.99.3
>13   5.015772 02:3b:c0:60:4c:95 ? ff:ff:ff:ff:ff:ff ARP 42 Who has 
> 
> With the protected domain, I only see these packets on the incoming
> mpw.
> 
> The destination router sees this:
> 189   15.137231   6c:b3:11:4b:07:d4   ff:ff:ff:ff:ff:ff   
> MPLS  60  MPLS Label Switched Packet
> 202   16.161025   6c:b3:11:4b:07:d4   ff:ff:ff:ff:ff:ff   
> MPLS  60  MPLS Label Switched Packet
> 213   17.157232   6c:b3:11:4b:07:d4   ff:ff:ff:ff:ff:ff   
> MPLS  60  MPLS Label Switched Packet
> 
> 02:3b:c0:60:4c:95 is the originating router.
> 6c:b3:11:4b:07:d4 is the physical interface facing the destination
> router
> 
> By examining the MPLS packets I could see they were being sent to the
> right label.  I haven't figured out how to decode the payload, but it's
> 42 bytes which is the exact same length as the inbound arp packets.
> 
> Maybe I'm making wrong assumptions here.  I would expect that either
> the bridge does proxy arp or that the bridge would re-encapsulate
> broadcast packets back into unicast MPLS/VPLS packets on the pseudwire
> which then gets unencapsulated by the destination router and treated as
> broadcast there. Meanwhile, of course, it would also broadcast that
> same arp request out any other interface in the same bridge.
> 



Re: dhcrelay multiple instances possible bug

2019-03-04 Thread David Gwynne
Hi Riccardo,

dhrelay only operates on a single interface, so you're not missing anything 
there.

Can you show me the ps output for the dhcrelay processes you start? The rcctl 
commands you show below don't include the rcctl start dhcrelay and 
dhcrelay_second bits.

I have the following in rc.local (mostly because this config predates rcctl):

foo=192.0.2.194
bar=192.0.2.196

echo -n 'start dhcp relays:'
for i in vlan371 vlan373 \
vlan835 \
vlan801 vlan847 vlan866 vlan867 \
vlan811 vlan815 vlan816 \
vlan1101 vlan1147 vlan1165 vlan1166 \
vlan1201 vlan1231 vlan1247 vlan1265 vlan1266 \
vlan1301 vlan1331 vlan1347 vlan1365 vlan1366 \
vlan971 vlan966 \
vlan1401 vlan1465 vlan1466 vlan1467 \
vlan1501 vlan1565 vlan1566 \
vlan1601 vlan1647 vlan1665 vlan1666 vlan1667 \
vlan1701 vlan1747 vlan1765 vlan1766 \
vlan1801 vlan1865 vlan1866 \
vlan1901 vlan1965 vlan1966 \
vlan2001 vlan2065 vlan2066 vlan2067 \
vlan2008 vlan2068 \
vlan2506 vlan2533 vlan2536 vlan2531 vlan2537 vlan2547; do
/usr/sbin/dhcrelay -i ${i} $foo $bar
echo -n " ${i}"
done
echo '.'

Which produces:

xdlg@shotgun1 pf$ ps -aux | grep dhc
_dhcp40965  0.0  0.0   532  1008 ??  Ssp   10Nov17   12:06.67 
/usr/sbin/dhcrelay -i vlan371 192.0.2.194 192.0.2.196
_dhcp16825  0.0  0.0   536  1012 ??  Ssp   10Nov172:08.80 
/usr/sbin/dhcrelay -i vlan867 192.0.2.194 192.0.2.196
_dhcp69672  0.0  0.0   532  1076 ??  Isp   10Nov170:46.06 
/usr/sbin/dhcrelay -i vlan866 192.0.2.194 192.0.2.196
_dhcp48117  0.0  0.0   536   972 ??  Isp   10Nov170:00.02 
/usr/sbin/dhcrelay -i vlan373 192.0.2.194 192.0.2.196
_dhcp43065  0.0  0.0   540  1068 ??  Isp   10Nov170:06.02 
/usr/sbin/dhcrelay -i vlan835 192.0.2.194 192.0.2.196
_dhcp77793  0.0  0.0   540   988 ??  Ssp   10Nov17   19:26.92 
/usr/sbin/dhcrelay -i vlan801 192.0.2.194 192.0.2.196
_dhcp68793  0.0  0.0   540  1028 ??  Isp   10Nov170:08.40 
/usr/sbin/dhcrelay -i vlan847 192.0.2.194 192.0.2.196
_dhcp12879  0.0  0.0   540  1016 ??  Isp   10Nov171:14.46 
/usr/sbin/dhcrelay -i vlan1101 192.0.2.194 192.0.2.196
_dhcp10430  0.0  0.0   544  1052 ??  Ssp   10Nov171:42.55 
/usr/sbin/dhcrelay -i vlan811 192.0.2.194 192.0.2.196
_dhcp87753  0.0  0.0   544  1016 ??  Isp   10Nov170:31.65 
/usr/sbin/dhcrelay -i vlan815 192.0.2.194 192.0.2.196
_dhcp21434  0.0  0.0   536  1024 ??  Isp   10Nov170:00.20 
/usr/sbin/dhcrelay -i vlan816 192.0.2.194 192.0.2.196
_dhcp17816  0.0  0.0   540  1020 ??  Isp   10Nov170:00.00 
/usr/sbin/dhcrelay -i vlan1147 192.0.2.194 192.0.2.196
_dhcp67338  0.0  0.0   540  1020 ??  Isp   10Nov170:00.11 
/usr/sbin/dhcrelay -i vlan1247 192.0.2.194 192.0.2.196
_dhcp73549  0.0  0.0   540  1020 ??  Isp   10Nov170:00.55 
/usr/sbin/dhcrelay -i vlan1165 192.0.2.194 192.0.2.196
_dhcp78748  0.0  0.0   540  1012 ??  Isp   10Nov170:02.33 
/usr/sbin/dhcrelay -i vlan1166 192.0.2.194 192.0.2.196
_dhcp82689  0.0  0.0   540  1008 ??  Isp   10Nov172:02.18 
/usr/sbin/dhcrelay -i vlan1201 192.0.2.194 192.0.2.196
_dhcp31199  0.0  0.0   540   996 ??  Isp   10Nov170:07.63 
/usr/sbin/dhcrelay -i vlan1231 192.0.2.194 192.0.2.196
_dhcp21332  0.0  0.0   532  1004 ??  Isp   10Nov171:24.02 
/usr/sbin/dhcrelay -i vlan1265 192.0.2.194 192.0.2.196
_dhcp35688  0.0  0.0   544  1040 ??  Isp   10Nov170:00.28 
/usr/sbin/dhcrelay -i vlan1347 192.0.2.194 192.0.2.196
_dhcp36741  0.0  0.0   540  1032 ??  Isp   10Nov170:07.17 
/usr/sbin/dhcrelay -i vlan1266 192.0.2.194 192.0.2.196
_dhcp90274  0.0  0.0   544  1024 ??  Isp   10Nov17   19:17.78 
/usr/sbin/dhcrelay -i vlan1301 192.0.2.194 192.0.2.196
_dhcp42199  0.0  0.0   548  1052 ??  Isp   10Nov170:00.17 
/usr/sbin/dhcrelay -i vlan1331 192.0.2.194 192.0.2.196
_dhcp83979  0.0  0.0   528  1000 ??  Ssp   10Nov172:09.78 
/usr/sbin/dhcrelay -i vlan1365 192.0.2.194 192.0.2.196
_dhcp52142  0.0  0.0   536   792 ??  Isp   10Nov170:00.00 
/usr/sbin/dhcrelay -i vlan965 192.0.2.194 192.0.2.196
_dhcp17747  0.0  0.0   540   996 ??  Isp   10Nov170:05.03 
/usr/sbin/dhcrelay -i vlan1366 192.0.2.194 192.0.2.196
_dhcp85673  0.0  0.0   536   988 ??  Isp   10Nov170:11.59 
/usr/sbin/dhcrelay -i vlan947 192.0.2.194 192.0.2.196
_dhcp  266  0.0  0.0   536   964 ??  Isp   10Nov170:01.84 
/usr/sbin/dhcrelay -i vlan966 192.0.2.194 192.0.2.196
_dhcp59857  0.0  0.0   540   984 ??  Isp   10Nov174:26.67 
/usr/sbin/dhcrelay -i vlan1401 192.0.2.194 192.0.2.196
_dhcp17159  0.0  0.0   536  1012 ??  Ssp   10Nov171:27.85 
/usr/sbin/dhcrelay -i vlan971 192.0.2.194 192.0.2.196
_dhcp67613  0.0  0.0   540  1028 ??  Isp   10Nov172:29.27 
/usr/sbin/dhcrelay -i vlan1465 192.0.2.194 192.0.2.196
_dhcp33040  0.0  0.0   536   840 ??  Isp   10Nov170:00.00 
/usr/sbin/dhcrelay -i vlan1565 192.0.2.194 192.0.2.196
_dhcp 4850  0.0  0.0   544   844 ??  Isp  

Re: Packet loss with latest snapshot

2019-03-04 Thread David Gwynne
On Mon, Mar 04, 2019 at 10:36:23AM +0100, Tony Sarendal wrote:
> On Mon, 4 Mar 2019, 09:43 Tony Sarendal,  wrote:
> 
> >
> >
> > Den m??n 4 mars 2019 kl 09:26 skrev Tony Sarendal :
> >
> >> Den s??n 3 mars 2019 kl 21:35 skrev Theo de Raadt :
> >>
> >>> Tony,
> >>>
> >>> Are you out of your mind?  You didn't provide even a rough hint about
> >>> what your firewall configuration looks like.  You recognize that's
> >>> pathetic, right?
> >>>
> >>> > Earlier in the week I could run parallel ping-pong tests through my
> >>> test
> >>> > firewalls
> >>> > at 300kpps without any packet loss. I updated to the latest snapshot
> >>> today
> >>> > and
> >>> > start to see packet loss at around 80kpps.
> >>> >
> >>> > /T
> >>> >
> >>> > OpenBSD 6.5-beta (GENERIC.MP) #764: Sun Mar  3 10:24:08 MST 2019
> >>> > dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/
> >>> GENERIC.MP
> >>> > real mem = 34300891136 (32711MB)
> >>> > avail mem = 33251393536 (31711MB)
> >>> > mpath0 at root
> >>> > scsibus0 at mpath0: 256 targets
> >>> > mainbus0 at root
> >>> > bios0 at mainbus0: SMBIOS rev. 2.7 @ 0xec170 (34 entries)
> >>> > bios0: vendor American Megatrends Inc. version "3.0" date 04/24/2015
> >>> > bios0: Supermicro X10SLD
> >>> > acpi0 at bios0: rev 2
> >>> > acpi0: sleep states S0 S4 S5
> >>> > acpi0: tables DSDT FACP APIC FPDT FIDT SSDT SSDT MCFG PRAD HPET SSDT
> >>> SSDT
> >>> > SPMI DMAR EINJ ERST HEST BERT
> >>> > acpi0: wakeup devices PEGP(S4) PEG0(S4) PEGP(S4) PEG1(S4) PEGP(S4)
> >>> PEG2(S4)
> >>> > PXSX(S4) RP01(S4) PXSX(S4) RP02(S4) PXSX(S4) RP03(S4) PXSX(S4) RP04(S4)
> >>> > PXSX(S4) RP05(S4) [...]
> >>> > acpitimer0 at acpi0: 3579545 Hz, 24 bits
> >>> > acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
> >>> > cpu0 at mainbus0: apid 0 (boot processor)
> >>> > cpu0: Intel(R) Xeon(R) CPU E3-1241 v3 @ 3.50GHz, 3500.68 MHz, 06-3c-03
> >>> > cpu0:
> >>> >
> >>> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,PERF,ITSC,FSGSBASE,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,MELTDOWN
> >>> > cpu0: 256KB 64b/line 8-way L2 cache
> >>> > cpu0: smt 0, core 0, package 0
> >>> > mtrr: Pentium Pro MTRR support, 10 var ranges, 88 fixed ranges
> >>> > cpu0: apic clock running at 99MHz
> >>> > cpu0: mwait min=64, max=64, C-substates=0.2.1.2.4, IBE
> >>> > cpu1 at mainbus0: apid 2 (application processor)
> >>> > cpu1: Intel(R) Xeon(R) CPU E3-1241 v3 @ 3.50GHz, 3500.01 MHz, 06-3c-03
> >>> > cpu1:
> >>> >
> >>> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,PERF,ITSC,FSGSBASE,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,MELTDOWN
> >>> > cpu1: 256KB 64b/line 8-way L2 cache
> >>> > cpu1: smt 0, core 1, package 0
> >>> > cpu2 at mainbus0: apid 4 (application processor)
> >>> > cpu2: Intel(R) Xeon(R) CPU E3-1241 v3 @ 3.50GHz, 3500.01 MHz, 06-3c-03
> >>> > cpu2:
> >>> >
> >>> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,PERF,ITSC,FSGSBASE,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,MELTDOWN
> >>> > cpu2: 256KB 64b/line 8-way L2 cache
> >>> > cpu2: smt 0, core 2, package 0
> >>> > cpu3 at mainbus0: apid 6 (application processor)
> >>> > cpu3: Intel(R) Xeon(R) CPU E3-1241 v3 @ 3.50GHz, 3500.01 MHz, 06-3c-03
> >>> > cpu3:
> >>> >
> >>> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,PERF,ITSC,FSGSBASE,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,MELTDOWN
> >>> > cpu3: 256KB 64b/line 8-way L2 cache
> >>> > cpu3: smt 0, core 3, package 0
> >>> > ioapic0 at mainbus0: apid 8 pa 0xfec0, version 20, 24 pins
> >>> > acpimcfg0 at acpi0
> >>> > acpimcfg0: addr 0xf800, bus 0-63
> >>> > acpihpet0 at acpi0: 14318179 Hz
> >>> > acpiprt0 at acpi0: bus 0 (PCI0)
> >>> > acpiprt1 at acpi0: bus 1 (PEG0)
> >>> > acpiprt2 at acpi0: bus 2 (PEG1)
> >>> > acpiprt3 at acpi0: bus -1 (PEG2)
> >>> > acpiprt4 at acpi0: bus 3 (RP01)
> >>> > acpiprt5 at acpi0: bus -1 (RP02)
> >>> > acpiprt6 at acpi0: 

Re: PPPoE vlan issue 6.4

2019-02-10 Thread David Gwynne
Hi Adam,

It sounds like you're on an ISP with very similar requirements to me. The exec 
summary of what my ISP wants is pppoe on vlan2, with the vlan priority forced 
to a single value.

Our (OpenBSD's) understanding of the priority field in VLAN headers is that it 
uses 802.1p for the fields value. 802.1p says that priories 0 and 1 are swapped 
on the wire, and we use that consistently in the system, ie, the priority you 
see in tcpdump on a vlan interface is the same as what you configure for the 
priority value there, and visa versa. Everyone else seems to think 0 is 0 and 1 
is 1, which can be confusing.

My ISP wants priority 0 on the wire, which means 1 in OpenBSD.

I'm using an APU1, so I have re interfaces instead of em. I have re0 going to 
the ISP, and re1 is my internal network.

hostname.re0:
up

hostname.vlan2:
vnetid 2
parent re0
link0 llprio 1
up

hostname.pppoe0:
== pppoe0 ==
inet 0.0.0.0 255.255.255.255 0.0.0.1
pppoedev vlan2
authproto pap
authname 'dlg@the_isp' authkey 'secret'
group external
!/sbin/route add default -ifp pppoe0 0.0.0.1
up

hostname.re1:
inet 192.168.1.1/24


In OpenBSD 6.5 the syntax for priority on vlan frames is different. Instead of 
"link0" and "llprio 1" you just set "txprio 1".

While figuring this stuff out I used the APU as a bridge between the ISP 
supplied router and the modem.

Hope this helps.

dlg


> On 10 Feb 2019, at 15:51, Adam Evans  wrote:
> 
> Some more debugging, a lot further but still no success.
> 
> I attached the DD-WRT modem directly to a computer to capture the PADI 
> packets.
> 
> Capturing from the DD-WRT modem directly, PADI packets look like the below:
> 
> 22:15:54.329145 a0:63:91:47:81:07 (oui Unknown) > Broadcast, ethertype 802.1Q 
> (0x8100), length 36: vlan 2, p 0, ethertype PPPoE D, PPPoE PADI 
> [Service-Name] [Host-Uniq 0xEE72]
>0x:  0002 8863 1109  000c 0101  0103  ...c
>0x0010:  0004 ee72    ...r..
> 
> 
> On the other end of the wire at the client the packets look like:
> 12:13:05.995412 a0:63:91:47:81:07 (oui Unknown) > Broadcast, ethertype PPPoE 
> D (0x8863), length 60: PPPoE PADI [Service-Name] [Host-Uniq 0x622A]
>   0x:  1109  000c 0101  0103 0004 622a  ..b*
>   0x0010:           
>   0x0020:       838c 7a4d   zM
> 
> 12:13:20.277749 a0:63:91:47:81:07 (oui Unknown) > Broadcast, ethertype PPPoE 
> D (0x8863), length 60: PPPoE PADI [Service-Name] [Host-Uniq 0xF02A]
>   0x:  1109  000c 0101  0103 0004 f02a  ...*
>   0x0010:           
>   0x0020:       e929 b08f   ...)..
> 
> From the above it looks like the PPPoE Discovery is not done over the vlan as 
> it get's stripped.
> 
> I updated the /etc/hostname.pppoe0 config to change pppodev from vlan2 to 
> em0. I then plugged the device in to the bridged modem and brought up the 
> PPPoE interface which returned the below. I do not have IPv6 setup in my 
> PPPoE config so it looks like the remote tries to send me a IPv6 packet which 
> causes OpenBSD to send a terminate session response.
> 
> # ifconfig pppoe0 up
> Feb 10 13:18:48 foo /bsd: pppoe0: lcp close(initial)
> Feb 10 13:18:48 foo /bsd: pppoe0: lcp open(initial)
> Feb 10 13:18:48 foo /bsd: pppoe0: lcp initial->starting
> Feb 10 13:18:48 foo /bsd: pppoe0: phase establish
> Feb 10 13:18:48 foo /bsd: pppoe0 (8863) state=1, session=0x0 output -> 
> ff:ff:ff:ff:ff:ff, len=18
> Feb 10 13:18:48 foo /bsd: pppoe0 (8863) state=2, session=0x0 output -> 
> 78:da:6e:de:db:d4, len=38
> Feb 10 13:18:48 foo /bsd: pppoe0: received unexpected PADO
> Feb 10 13:18:48 foo last message repeated 10 times
> Feb 10 13:18:48 foo /bsd: pppoe0: session 0xe84d connected
> Feb 10 13:18:48 foo /bsd: pppoe0: lcp up(starting)
> Feb 10 13:18:48 foo /bsd: pppoe0: lcp starting->req-sent
> Feb 10 13:18:48 foo /bsd: pppoe0: lcp output  05-06-0f-4a-92-53-01-04-05-d4>
> Feb 10 13:18:48 foo /bsd: pppoe0 (8864) state=3, session=0xe84d output -> 
> 78:da:6e:de:db:d4, len=22
> Feb 10 13:18:48 foo /bsd: pppoe0: lcp input(req-sent):  len=18 
> 01-04-05-d4-03-04-c0-23-05-06-b1-df-b5-ab-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00>
> Feb 10 13:18:48 foo /bsd: pppoe0: lcp parse opts: mru auth-proto magic 
> Feb 10 13:18:48 foo /bsd: pppoe0: lcp parse opt values: mru 1492 auth-proto 
> magic 0xb1dfb5ab send conf-ack
> Feb 10 13:18:48 foo /bsd: pppoe0: lcp output  01-04-05-d4-03-04-c0-23-05-06-b1-df-b5-ab>
> Feb 10 13:18:48 foo /bsd: pppoe0 (8864) state=3, session=0xe84d output -> 
> 78:da:6e:de:db:d4, len=26
> Feb 10 13:18:48 foo /bsd: pppoe0: lcp req-sent->ack-sent
> Feb 10 13:18:48 foo /bsd: pppoe0: lcp input(ack-sent):  len=14 
> 05-06-0f-4a-92-53-01-04-05-d4-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00>
> Feb 

Re: SNMP reporting on VXLAN interfaces

2018-08-16 Thread David Gwynne
On Thu, Aug 16, 2018 at 10:51:25AM +1000, Jason Tubnor wrote:
> Hi,
> 
> Not sure if anyone else here is using SNMP for obtaining VXLAN(4) adapter
> throughput but after some testing (clamping with PF queues), I have
> discovered that throughput on VXLAN interfaces via SNMP are reporting
> exactly double the data throughput than what is measured either through
> iperf or pfctl -vvsq .  Regular interfaces on the machine below (vmx) are
> reporting correctly.
> 
> Am I missing something here or could it be a potential bug in the VXLAN
> code in how it reports into snmpd?

The vxlan driver counts something that the network stack does for it
now. The diff below fixes the problem if you want to try it, but I will
be committing it soon.

Cheers,
dlg

Index: if_vxlan.c
===
RCS file: /cvs/src/sys/net/if_vxlan.c,v
retrieving revision 1.67
diff -u -p -r1.67 if_vxlan.c
--- if_vxlan.c  20 Feb 2018 01:20:37 -  1.67
+++ if_vxlan.c  17 Aug 2018 01:36:55 -
@@ -929,9 +929,6 @@ vxlan_output(struct ifnet *ifp, struct m
bridge_tunneluntag(m);
 #endif
 
-   ifp->if_opackets++;
-   ifp->if_obytes += m->m_pkthdr.len;
-
m->m_pkthdr.ph_rtableid = sc->sc_rdomain;
 
 #if NPF > 0



Re: OSPF over gif on top of IPsec transport -current

2018-03-13 Thread David Gwynne

> On 10 Mar 2018, at 08:01, Remi Locherer  wrote:
> 
> 
> With below diff the setup works as expected: tcpdump shows OSPF hellos
> on gif0 and ospfd sees the neighbour.
> 
> I don't think it's the correct fix though.

functionally it is the correct fix.

when i reworked gif(4) in src/sys/net/if_gif.c r1.108, i merged the ipv4 and 
ipv6 input paths. the ipv6 input code had this check, but ipv4 did not. now it 
is applied to ipv4, but it is obviously wrong for both address families.

please commit the removal of this check, ok by me.

thank you to everyone for the but report and debugging. i'm sorry for taking so 
long to figure this out. 

dlg 

> 
> 
> Index: if_gif.c
> ===
> RCS file: /cvs/src/sys/net/if_gif.c,v
> retrieving revision 1.112
> diff -u -p -r1.112 if_gif.c
> --- if_gif.c  28 Feb 2018 23:28:05 -  1.112
> +++ if_gif.c  9 Mar 2018 20:52:46 -
> @@ -745,8 +745,8 @@ gif_input(struct gif_tunnel *key, struct
>   }
>   
>   /* XXX What if we run transport-mode IPsec to protect gif tunnel ? */
> - if (m->m_flags & (M_AUTH | M_CONF))
> - return (-1);
> + //if (m->m_flags & (M_AUTH | M_CONF))
> + //  return (-1);
> 
>   key->t_rtableid = m->m_pkthdr.ph_rtableid;



Re: OSPF over gif on top of IPsec transport -current

2018-03-13 Thread David Gwynne

> On 11 Mar 2018, at 05:30, Atanas Vladimirov  wrote:
> 
> On 2018-03-10 00:01, Remi Locherer wrote:
>>> 
>> With below diff the setup works as expected: tcpdump shows OSPF hellos
>> on gif0 and ospfd sees the neighbour.
>> I don't think it's the correct fix though.
>> Index: if_gif.c
>> ===
>> RCS file: /cvs/src/sys/net/if_gif.c,v
>> retrieving revision 1.112
>> diff -u -p -r1.112 if_gif.c
>> --- if_gif.c 28 Feb 2018 23:28:05 -  1.112
>> +++ if_gif.c 9 Mar 2018 20:52:46 -
>> @@ -745,8 +745,8 @@ gif_input(struct gif_tunnel *key, struct
>>  }
>>  /* XXX What if we run transport-mode IPsec to protect gif tunnel ? */
>> -if (m->m_flags & (M_AUTH | M_CONF))
>> -return (-1);
>> +//if (m->m_flags & (M_AUTH | M_CONF))
>> +//  return (-1);
>>  key->t_rtableid = m->m_pkthdr.ph_rtableid;
> 
> Hi Remi,
> 
> Thanks for confirming that there is an issue and I'm not doing something 
> wrong on my side.
> I'll try the diff as soon as possible.

it isnt clear to me how ipsec and gif(4) are supposed to interact. on the one 
hand you have the gif(4) manpage saying this:

BUGS
 There are many tunnelling protocol specifications, defined differently
 from each other.  gif may not interoperate with peers which are based on
 different specifications, and are picky about outer header fields.  For
 example, you cannot usually use gif to talk with IPsec devices that use
 IPsec tunnel mode.

so it's saying that ipsec tunnel mode and gif don't work, but then you have the 
code that remi is disabling saying that gif and ipsec transport dont work.

i can understand the issue since a decrypted esp packet looks a lot like the 
packets gif wants to handle. if we change to code or doco to make something 
work, which way should we go?

right now i would use gre inside ipsec transport mode, not gif. it has the 
benefit of working, and it is harder for traffic inside the tunnel to leak out 
of ipsec. more specifically, gif handles 3 ip protocols, ipv4, ipv6, and mpls, 
which are ip protocol numbers 4, 41, and 137 respectively. it is likely that 
people could set up ipsec to protect ipv4, but forget about ipv6 and mpls. if 
you then configure v6 or mpls on the gif interface, that traffic will leak.

gre on the other hand is a single ip protocol, so more straightforward to 
protect. there's also a very clear line in the sand between the inner and outer 
traffic, which esp tunnel and transport mode lack.

dlg


Re: gif(4) changes vs tunnelbroker

2018-02-28 Thread David Gwynne

> On 1 Mar 2018, at 02:22, Andreas Bartelt <o...@bartula.de> wrote:
> 
> On 02/27/18 22:35, Pavel Korovin wrote:
>> On 02/28, David Gwynne wrote:
>>> what is the status of sysctl net.inet.ipip ?
>> David, thank you! That was easy :)
>> Sorry for the noise.
>> $ sysctl net.inet.ipip.allow
>> net.inet.ipip.allow=0
>> # sysctl -w net.inet.ipip.allow=1
>> net.inet.ipip.allow: 0 -> 1
>> $ ping6 www.google.com
>> PING www.google.com (2a00:1450:4013:c01::67): 56 data bytes
>> 64 bytes from 2a00:1450:4013:c01::67: icmp_seq=0 hlim=48 time=40.500 ms
>> 64 bytes from 2a00:1450:4013:c01::67: icmp_seq=1 hlim=48 time=40.645 ms
>> ^C
> 
> I'm also observing a breakage of a previously working IPv6 tunnelbroker 
> config on current (problem introduced since at least Feb, 23rd).
> 
> The combination of two things made it work again (or at least works around 
> the underlying problem):
> 1) sysctl net.inet.ipip.allow=1 [not yet documented at 
> www.openbsd.org/faq/current.html]
> 2) removing ``set state-policy if-bound'' from my pf.conf [which always 
> worked before with the same tunnelbroker setup]
> 
> According to pflog(4), a ping6 to some destination now looks buggy to me:
> - outgoing icmp6 echo request is only visible on gif(4)
> - incoming icmp6 echo reply is only visible on the underlying physical 
> interface of gif(4)
> which blocks the ping6 in the case of ``set state-policy if-bound''.

i found what i think is the problem.

it turns out the net.inet.ipip.allow sysctl was a red herring. it controls the 
processing of ipip by the network stack, it is not related to whether gif 
should accept packets. the problem was i got the mapping of ip addresses in 
incoming packets to the addresses on the tunnels wrong.

this should be fixed in src/sys/net/if_gif.c r1.112.

sorry for the inconvenience.

dlg



Re: gif(4) changes vs tunnelbroker

2018-02-27 Thread David Gwynne


> On 27 Feb 2018, at 4:10 am, Pavel Korovin  wrote:
> 
> Dear all,
> 
> After upgrading several hosts to -current I noticed that all my IPv6 tunnels
> via tunnelbroker stopped working. Recently introduced changes to gif(4) 
> (since 
> late December 2017) are too complex for me to grasp, maybe anybody on the list
> can advise.

hi pavel,

there was a window where gif only allowed configuration of the tunnel 
parameters while the interface was down, but still implicitly brought the 
interface up when addresses were configured. a lot of gif configs (or tunnel 
configs generally) have the ips set before the tunnel, so they'd go up, and 
then prevent configuration.

this has been fixed in -current, but a snap with the fix may not have made it 
out.

if this isn't the problem, can you send me your config and the state of the gif 
interfaces that are at fault and i'll see what else i broke.

cheers,
dlg

> 
> -- 
> With best regards,
> Pavel Korovin
> 



Re: re0 and re1 watchdog timeouts, and system freeze

2017-06-11 Thread David Gwynne
On Fri, Jun 09, 2017 at 07:19:34PM +0200, Bj??rn Ketelaars wrote:
> On Fri 09/06/2017 12:07, Martin Pieuchot wrote:
> > On 08/06/17(Thu) 20:38, Bj??rn Ketelaars wrote:
> > > On Thu 08/06/2017 16:55, Martin Pieuchot wrote:
> > > > On 07/06/17(Wed) 09:43, Bj??rn Ketelaars wrote:
> > > > > On Sat 03/06/2017 08:44, Bj??rn Ketelaars wrote:
> > > > > > 
> > > > > > Reverting back to the previous kernel fixed the issue above. 
> > > > > > Question: can
> > > > > > someone give a hint on how to track this issue?
> > > > > 
> > > > > After a bit of experimenting I'm able to reproduce the problem. 
> > > > > Summary is
> > > > > that queueing in pf and use of a current (after May 30), multi 
> > > > > processor
> > > > > kernel (bsd.mp from snapshots) causes these specific watchdog timeouts
> > > > > followed by a system freeze.
> > > > > 
> > > > > Issue is 'gone' when:
> > > > > 1.) using an older kernel (before May 30);
> > > > > 2.) removal of queueing statements from pf.conf. Included below the 
> > > > > specific
> > > > > snippet;
> > > > > 3.) switch from MP kernel to SP kernel.
> > > > > 
> > > > > New observation is that while queueing, using a MP kernel, the 
> > > > > download
> > > > > bandwidth is only a fraction of what is expected. Exchanging the MP 
> > > > > kernel
> > > > > with a SP kernel restores the download bandwidth to expected level.
> > > > > 
> > > > > I'm guessing that this issue is related to recent work on PF?
> > > > 
> > > > It's certainly a problem in, or exposed by, re(4) with the recent MP 
> > > > work
> > > > in the network stack.
> > > > 
> > > > It would help if you could build a kernel with MP_LOCKDEBUG defined and
> > > > see if the resulting kernel enters ddb(4) instead of freezing.
> > > > 
> > > > Thanks,
> > > > Martin
> > > 
> > > Thanks for the hint! It helped in entering ddb. I collected a bit of 
> > > output,
> > > which you can find below. If I read the trace correctly the crash is 
> > > related
> > > to line 1750 of sys/dev/ic/re.c:
> > > 
> > >   d->rl_cmdstat |= htole32(RL_TDESC_CMD_EOF);
> > 
> > Could you test the diff below, always with a MP_LOCKDEBUG kernel and
> > tell us if you can reproduce the freeze or if the kernel enters ddb(4)?
> > 
> > Another question, how often do you see "watchdog timeout" messages?
> > 
> > Index: re.c
> > ===
> > RCS file: /cvs/src/sys/dev/ic/re.c,v
> > retrieving revision 1.201
> > diff -u -p -r1.201 re.c
> > --- re.c24 Jan 2017 03:57:34 -  1.201
> > +++ re.c9 Jun 2017 10:04:43 -
> > @@ -2074,9 +2074,6 @@ re_watchdog(struct ifnet *ifp)
> > s = splnet();
> > printf("%s: watchdog timeout\n", sc->sc_dev.dv_xname);
> >  
> > -   re_txeof(sc);
> > -   re_rxeof(sc);
> > -
> > re_init(ifp);
> >  
> > splx(s);
> 
> The diff (with a MP_LOCKDEBUG kernel) resulted in similar traces as before.
> ddb Output is included below.
> 
> With your diff the number of timeout messages decreased from 9 to 2 before
> entering ddb.

can you try the diff below please?

Index: hfsc.c
===
RCS file: /cvs/src/sys/net/hfsc.c,v
retrieving revision 1.39
diff -u -p -r1.39 hfsc.c
--- hfsc.c  8 May 2017 11:30:53 -   1.39
+++ hfsc.c  12 Jun 2017 05:08:01 -
@@ -817,7 +817,7 @@ hfsc_deferred(void *arg)
KASSERT(HFSC_ENABLED(ifq));
 
if (!ifq_empty(ifq))
-   (*ifp->if_qstart)(ifq);
+   ifq_start(ifq);
 
hif = ifq->ifq_q;
 



Re: SCSI Enclosure Service

2017-06-08 Thread David Gwynne
hey jens,

from what i can tell, you talk to the ami mg9071 chips on that enclosure using 
sgpio, not in band using smp (sas mgmt protocol) or ses as a scsi device.

i get the impression that mpii hardware does have some understanding of 
enclosures connected via sgpio, but i'm not sure what benefit it would provide. 
it may affect addressing on the bus, but im not sure you'd get temperatures or 
fan speeds or anything off it.

cheers,
dlg

> On 9 Jun 2017, at 02:05, Jens A. Griepentrog  
> wrote:
> 
> Dear Listeners,
> 
> Let me know, please, if enclosure monitoring
> is supported for disks attached to Supermicro
> M28SAB drive cages (with two AMI MG9071 chips)
> or similar backplanes. Drives work fine when
> attached to some LSI 2008 controller but there
> appear no "ses* at scsibus?" boot messages
> (see below, disks attached to the drive cage
> are sd4 ... sd11), jumper settings on the cage:
> JP61 2-3: Fan disabled (there is no fan)
> JP62 1-2: Enclosure monitor enabled
> 
> With best regards,
> Jens
> 
> 
> 
> OpenBSD 6.1 (GENERIC.MP) #6: Mon May 22 20:34:30 CEST 2017
> rob...@syspatch-61-amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> real mem = 17154113536 (16359MB)
> avail mem = 16629547008 (15859MB)
> mpath0 at root
> scsibus0 at mpath0: 256 targets
> mainbus0 at root
> bios0 at mainbus0: SMBIOS rev. 2.6 @ 0xf06f0 (62 entries)
> bios0: vendor American Megatrends Inc. version "0705" date 06/29/2010
> bios0: ASUSTeK Computer INC. P7F-M WS
> acpi0 at bios0: rev 2
> acpi0: sleep states S0 S1 S3 S4 S5
> acpi0: tables DSDT FACP APIC MCFG OEMB HPET SSDT
> acpi0: wakeup devices BR1E(S4) UAR1(S4) PS2K(S4) EUSB(S4) USB0(S4) USB1(S4) 
> USB2(S4) USB3(S4) USBE(S4) USB4(S4) USB5(S4) USB6(S4) BR21(S4) BR22(S4) 
> BR23(S4) P0P1(S4) [...]
> acpitimer0 at acpi0: 3579545 Hz, 24 bits
> acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
> cpu0 at mainbus0: apid 0 (boot processor)
> cpu0: Intel(R) Xeon(R) CPU L3426 @ 1.87GHz, 1867.00 MHz
> cpu0: 
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,SSE4.1,SSE4.2,POPCNT,NXE,RDTSCP,LONG,LAHF,PERF,ITSC,SENSOR
> cpu0: 256KB 64b/line 8-way L2 cache
> cpu0: TSC frequency 1867000680 Hz
> cpu0: smt 0, core 0, package 0
> mtrr: Pentium Pro MTRR support, 8 var ranges, 88 fixed ranges
> cpu0: apic clock running at 133MHz
> cpu0: mwait min=64, max=64, C-substates=0.2.1.1, IBE
> cpu1 at mainbus0: apid 2 (application processor)
> cpu1: Intel(R) Xeon(R) CPU L3426 @ 1.87GHz, 1866.73 MHz
> cpu1: 
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,SSE4.1,SSE4.2,POPCNT,NXE,RDTSCP,LONG,LAHF,PERF,ITSC,SENSOR
> cpu1: 256KB 64b/line 8-way L2 cache
> cpu1: smt 0, core 1, package 0
> cpu2 at mainbus0: apid 4 (application processor)
> cpu2: Intel(R) Xeon(R) CPU L3426 @ 1.87GHz, 1866.73 MHz
> cpu2: 
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,SSE4.1,SSE4.2,POPCNT,NXE,RDTSCP,LONG,LAHF,PERF,ITSC,SENSOR
> cpu2: 256KB 64b/line 8-way L2 cache
> cpu2: smt 0, core 2, package 0
> cpu3 at mainbus0: apid 6 (application processor)
> cpu3: Intel(R) Xeon(R) CPU L3426 @ 1.87GHz, 1866.73 MHz
> cpu3: 
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,SSE4.1,SSE4.2,POPCNT,NXE,RDTSCP,LONG,LAHF,PERF,ITSC,SENSOR
> cpu3: 256KB 64b/line 8-way L2 cache
> cpu3: smt 0, core 3, package 0
> ioapic0 at mainbus0: apid 7 pa 0xfec0, version 20, 24 pins
> acpimcfg0 at acpi0 addr 0xe000, bus 0-255
> acpihpet0 at acpi0: 14318179 Hz
> acpiprt0 at acpi0: bus 0 (PCI0)
> acpiprt1 at acpi0: bus 7 (BR1E)
> acpiprt2 at acpi0: bus -1 (BR21)
> acpiprt3 at acpi0: bus -1 (BR22)
> acpiprt4 at acpi0: bus -1 (BR23)
> acpiprt5 at acpi0: bus -1 (P0P1)
> acpiprt6 at acpi0: bus 1 (P0P3)
> acpiprt7 at acpi0: bus -1 (P0P4)
> acpiprt8 at acpi0: bus -1 (P0P5)
> acpiprt9 at acpi0: bus -1 (P0P6)
> acpiprt10 at acpi0: bus 2 (BR20)
> acpiprt11 at acpi0: bus 5 (BR26)
> acpiprt12 at acpi0: bus 6 (BR27)
> acpicpu0 at acpi0: !C3(350@17 mwait.1@0x20), !C3(500@17 mwait.1@0x10), 
> C1(1000@1 mwait.1), PSS
> acpicpu1 at acpi0: !C3(350@17 mwait.1@0x20), !C3(500@17 mwait.1@0x10), 
> C1(1000@1 mwait.1), PSS
> acpicpu2 at acpi0: !C3(350@17 mwait.1@0x20), !C3(500@17 mwait.1@0x10), 
> C1(1000@1 mwait.1), PSS
> acpicpu3 at acpi0: !C3(350@17 mwait.1@0x20), !C3(500@17 mwait.1@0x10), 
> C1(1000@1 mwait.1), PSS
> "PNP0501" at acpi0 not configured
> "PNP0303" at acpi0 not configured
> acpibtn0 at acpi0: PWRB
> ipmi at mainbus0 not configured
> cpu0: Enhanced SpeedStep 1867 MHz: speeds: 1868, 1867, 

Re: Does CARP need Layer 2 ?

2017-04-17 Thread David Gwynne

> On 18 Apr 2017, at 03:54, Bob Jones 
>  wrote:
> 
> Hi,
> 
> Looking at the docs, unlike pfsync, sasyncd and everything else, you
> seem to be unable to define a "different" interface to CARP for the
> purposes of monitoring.  Everything seems to need to go over the one
> carpdev.
> 
> My question arises is because I have a couple of OpenBSD units due to
> be plugged into upstream router ports (direct patch, not via
> intermediate switch).
> 
> Obviously for most things, OSPF and BGP will take care of redundancy.
> But for the purposes of VPN failover, I would like to use CARP on my
> "external" interfaces, but as far as my interpretation of the docs go,
> CARP protocol won't work over Layer 3 ?

that's correct.

> Could someone provide further insight into whether my interpretation
> is correct, and whether I have any other options available ?  I don't
> really want to go adding a layer 2 switch on my side because that just
> introduces extra point of failure.

off the top of my head, you have two paths you could take.

firstly, you could advertise the vpn service as the same ip addresses bound to 
loopback (lo(4)) interfaces on each of the hosts. ie, a cheap and cheerful 
anycast setup. bgp as your routing protocol should work well for this if you're 
interested in an active/passive setup.

the second option could be to set up a l2 medium between your hosts, 
specifically, you can set up etherip tunnels between them and land your carp 
interface on that.

just some ideas.

cheers,
dlg



Re: Per-device multiqueuing would be fantastic. Are there any plans? Are donations a matter here?

2017-02-10 Thread David Gwynne
> On 9 Feb 2017, at 7:11 pm, Mikael <mikael.ml...@gmail.com> wrote:
>
> 2017-02-09 16:41 GMT+08:00 David Gwynne <da...@gwynne.id.au>:
> ..
> hey mikael,
>
> can you be more specific about what you mean by multiqueuing for disks? even
a
> reference to an implementation of what you’re asking about would help me
> answer this question.
>
> ill write up a bigger reply after my kids are in bed.
>
> cheers,
> dlg
>
> Hi David,
>
> Thank you for your answer.
>
> The other OpenBSD:ers I talked to also used the wording "multiqueue". My
understanding of the kernel's workings here is too limited.
>
> If I would give a reference to some implementation out there, I guess I
would to the one introduced in Linux 3.13/3.16:
>
> "Linux Block IO: Introducing Multi-queue SSD Access on Multi-core Systems"
> http://kernel.dk/blk-mq.pdf
>
> "Linux Multi-Queue Block IO Queueing Mechanism (blk-mq)"
>
https://www.thomas-krenn.com/en/wiki/Linux_Multi-Queue_Block_IO_Queueing_Mech
anism_(blk-mq)
>
> "The multiqueue block layer"
> https://lwn.net/Articles/552904/
>
> Looking forward a lot to your followup.

sorry, i feel asleep too.

thanks for the links to info on linux mq stuff. i can understand what it
provides. however, in the situation you are testing im not sure it is
necessarily the means to addressing the difference in performance you’re
seeing in your environment.

anyway, tldr: you’re suffering under the kernels big giant lock.

according to the dmesg you provided you’re testing a single ssd (a samsung
850) connected to a sata controller (ahci). with this equipment all operations
between the computer and the actual disk are all issued through achi. because
of way ahci operates, operations on a specific disk are effectively serialises
at this point. in your setup you have multiple cpus though, and it sounds like
your benchmark runs on them concurrently, issuing io through the kernel to the
disk via ahci.

two things are obviously different between linux and openbsd that would affect
this benchmark. the first is that io to physical devices is limited to a value
called MAXPHYS in the kernel, which is 64 kilobytes. any larger read
operations issued by userland to the kernel get cut up into a series of 64k
reads against the disk. ahci itself can handle 4 meg per transfer.

the other difference is that, like most of the kernel, read() is serialised by
the big lock. the result of this is if you have userland on multiple cpus
creating a heavily io bound workload, all the cpus end up waiting for each
other to run. while one cpu is running through the io stack down to ahci,
every other cpu is spinning waiting for its turn to do the same thing.

the distance between userland and ahci is relatively long. going through the
buffer cache (i.e., /dev/sd0) is longer than bypassing it (through /dev/rsd0).
your test results confirm this.

the solution to this problem is to look at taking the big lock away from the
io paths. this is non-trivial work though.

i have already spent time working on making sd(4) and the scsi midlayer
mpsafe, but haven’t been able to take advantage of that work because both
sides of the scsi subsystem (adapters like ahci and the block layer and
syscalls) still need the big lock. some adapters have been made mpsafe, but i
dont think ahci was on that list. when i was playing with mpsafe scsi, i gave
up the big lock at the start of sd(4) and ran it, the midlayer, and mpi(4) or
mpii(4) unlocked. if i remember correctly, even just unlocking that part of
the stack doubled the throughput of the system.

the work ive done in the midlayer should mean if we can access it without
biglock, accesses to disks beyond adapters like ahci should scale pretty well
cpu cores because of how io is handed over to the midlayer. concurrent
submissions by multiple cpus end up delegating one of the cpus to operate on
the adapter on behalf of all the cpus. while that first cpu is still
submitting to the hardware, other cpus are not blocked from queuing more work
and returning to user land.

i can go into more detail if you want.

cheers,
dlg



Re: Per-device multiqueuing would be fantastic. Are there any plans? Are donations a matter here?

2017-02-09 Thread David Gwynne
> On 9 Feb 2017, at 12:42 pm, Mikael  wrote:
>
> Hi misc@,
>
> The SSD reading benchmark in the previous email shows that per-device
> multiqueuing will boost multithreaded random read performance very much
> e.g. by ~7X+, e.g. the current 50MB/sec will increase to ~350MB/sec+.
>
> (I didn't benchmark yet but I suspect the current 50MB/sec is system-wide,
> whereas with multiqueuing the 350MB/sec+ would be per drive.)
>
> Multiuser databases, and any parallell file reading activity, will/would
> see a proportional speedup with multiqueing.

hey mikael,

can you be more specific about what you mean by multiqueuing for disks? even a
reference to an implementation of what you’re asking about would help me
answer this question.

ill write up a bigger reply after my kids are in bed.

cheers,
dlg

>
>
> Do you have plans to implement this?
>
> Was anything done to this end already, any idea when multiqueueing can
> happen?
>
>
> Are donations a matter here, if so about what size of donations and to who?
>
> Someone suggested that implementing it would take a year of work.
>
> Any clarifications of what's going on and what's possible and how would be
> much appreciated.
>
>
> Thanks,
> Mikael



Re: NVM Express (NVMe) support status

2016-04-15 Thread David Gwynne
> On 12 Feb 2016, at 7:01 PM, Evgeniy Sudyr  wrote:
>
> Hi all,
>
> I'm looking status of NVM Express support in -current (got Intel 750
> consumer device
>
https://www-ssl.intel.com/content/www/us/en/solid-state-drives/solid-state-dr
ives-750-series.html
> for home desktop, but it looks like all devices are using the same
> Specification).
>
> I found 2 commits of nvme_pci.c from @dlg there:
>
> http://cvsweb.openbsd.org/cgi-bin/cvsweb/src/sys/dev/pci/nvme_pci.c
>
> But commit message sounds work is abandoned, because of problems faced.
>
> I found specification exists there
http://www.nvmexpress.org/specifications/
>
> It also works for me under Linux and NVMe driver is maintained by
> Intel developer Matthew Wilcox.
> https://github.com/torvalds/linux/tree/master/drivers/nvme
>
> Looks already implemented in FreeBSD (didn't tested yet):
>
>
http://svnweb.freebsd.org/base/head/sys/dev/nvme/nvme.h?view=log=2406
16
> https://svnweb.freebsd.org/base/head/sys/dev/nvme/
>
> It will be great to get this "awesome fast" storage support in next
> OpenBSD release(s).
>
> Anybody aware of any plans on this?

it might work if you give it a go now.



Re: Gif tunnel / pf / queueing

2016-03-02 Thread David Gwynne
> On 2 Mar 2016, at 1:51 AM, Christopher Sean Hilton 
wrote:
>
> I would like to apply queueing to packets traversing a gif tunnel. I'd
> like to know what works better, Tagging outbound packets on the gif
> interface and applying them to queues by tag when they leave on the
> external interface? Or assigning packets to the queues directly when
> they are on the gif interface?
>
> If I understand things correctly queues work on interfaces. That leads
> me to think that tagging for later queueing is the better approach.

in this instance it shouldn't matter. however, if you have multiple outgoing
interfaces the gif traffic can leave on, it's better to apply the policy on
the gif interface.

>
> --
> Chris
>
>  __o  "All I was trying to do was get home from work."
>_`\<,_   -Rosa Parks
> ___(*)/_(*).___o..___..o...ooO..._
> Christopher Sean Hilton[chris/at/vindaloo/dot/com]
>
> [demime 1.01d removed an attachment of type application/pgp-signature which
had a name of signature.asc]



Re: PF: can't make queueing and priority work as expected

2016-01-15 Thread David Gwynne
> On 15 Jan 2016, at 9:07 PM, Craig Skinner <skin...@britvault.co.uk> wrote:
>
> On 2016-01-15 Fri 12:53 PM |, David Gwynne wrote:
>>> On 13 Jan 2016, at 19:19, Marko Cupa?? <marko.cu...@mimar.rs> wrote:
>>>
>>> Have we come to conclusion that currently prio makes no sense at all?
>>
>> it wont have the effect you want. that doesn't mean it doesn't make sense
>> somewhere else.
>>
>
> Such as an ADSL PPPoE bridge?

yeah.

the other thing to note is that loading a ruleset resets the assignment of
existing states to queues.

states are assigned to queues via rules, but if the rules go away (which is
what happens when you load a new ruleset) the intermediary between rules and
queues has gone.

it kind of sucks, especially for testing.

dlg



Re: PF: can't make queueing and priority work as expected

2016-01-14 Thread David Gwynne
> On 13 Jan 2016, at 19:19, Marko Cupać <marko.cu...@mimar.rs> wrote:
>
> On Tue, 12 Jan 2016 16:40:58 +0100
> Claudio Jeker <cje...@diehard.n-r-g.com> wrote:
>
>> On Tue, Jan 12, 2016 at 05:33:06AM -0700, Daniel Melameth wrote:
>>> On Mon, Jan 11, 2016 at 9:37 PM, David Gwynne <da...@gwynne.id.au>
>>> wrote:
>>>>> On 11 Jan 2016, at 22:43, Daniel Melameth <dan...@melameth.com>
>>>>> wrote: On Sun, Jan 10, 2016 at 7:58 AM, Marko Cupa??
>>>>> <marko.cu...@mimar.rs>
>>> wrote:
>>>>>> On Sat, 9 Jan 2016 11:11:27 -0700
>>>>>> Daniel Melameth <dan...@melameth.com> wrote:
>>>>>>> You NEED to set a max on your ROOT queues.
>>>>>> I came to this conclusion as well. But not only on root queues.
>>>>>> For example, when max is set on root queue but only bandwidth
>>>>>> on child queues, no shaping takes place...
>>>>> This works for me.
>>>>>> Or, to cut the long story short, if someone can paste queue
>>>>>> definition which accomplishes 'give both queues max bandwidth,
>>>>>> but throttle traffic from first queue when traffic from the
>>>>>> second one arrives', I will be more than happy to quit
>>>>>> bothering misc@ list readers with my rants and observations.
>>>>> I would expect this to be possible with prio alone, but I've
>>>>> never been able to get it to work.  Perhaps I'm misunderstanding
>>>>> how prio works.
>>>> prio is basically an array of lists of packets to be transmitted.
>>>> high
>>> priority packets go on a different list to low priority packets.
>>>>
>>>> the problem is the way packets go on and off these lists.
>>>> basically as soon
>>> as a packet is queued on one of these lists for transmission, we
>>> call the driver immediately to send it. generally as soon as a
>>> packet is queued on the interface, it immediately gets dequeued by
>>> the driver and transmitted on the hardware.
>>>>
>>>> it is only when you build up a backlog of packets that priq can
>>>> come into
>>> effect. the only way you can build up a backlog of packets is if
>>> your hardware is slower at transmitting packets than the thing that
>>> generates these packets to send.
>>>>
>>>> in your case you're probably getting packets from a relatively
>>>> slow internet
>>> connection and transmitting them on a high speed local network. the
>>> transmit hardware is almost certainly going to be faster than your
>>> source of packets, so you'll never build up a queue of backlogged
>>> packets, so prio is effectively a nop.
>>>>
>>>> dlg
>>>
>>> Thanks for taking the time to chime in guys.  Prior to implementing
>>> any queueing, I tested this stuff out on a LAN--so no slower
>>> connectionswere involved--and I was unable to see prio in action, at
>>> least not with any observable similarity to ALTQ's PRIQ.
>>>
>>> A simple rule set:
>>>
>>> match out on egress proto tcp to port 12345 set prio 7
>>> match out on egress proto tcp to port 12346 set prio 0
>>> pass
>>>
>>> Using tcpbench to push packets into both queues, I would have
>>> expected the packets destined for port 12346 to get throttled, but
>>> both flows simply reached an equilibrium, which I would have
>>> expected without prio.  Under PRIQ, I would have seen the flow to
>>> port 12346 get almost completely starved of bandwidth.  When doing
>>> non-prio queuing with a similarly simple ruleset, both flows
>>> properly matched their target bandwidth.
>>
>> This assumes that you manage to fill the TX interface queue to a level
>> that it always fills the tx DMA rings before being empty. On high
>> speed interfaces this most of the time not the case and so both
>> sessions are able to reach the maximum bandwidth.
>> To be honest prio queue only make sense when you have a slow interface
>> (10Mbps) or a shaper in place that causes the queue to fill up.
>> There is currently no shaper you can use together with the prio
>> queues so only option one remains.
>>
>
> Have we come to conclusion that currently prio makes no sense at all?

it wont have the effect you want. that doesn't mean it doesn't make sense
somewhere else.

>
> Can I hope that saying 'currently' means this is not the intended
> design? Or should I come to peace with the fact that with OpenBSD and
> PF I can forget about shaping inbound TCP traffic in a way that
> child queues can expand to max link bandwidth unless there is a
> congestion, while in congestion admin can choose which child queues to
> throttle and in which order?

hfsc might need some work at the code level, it might just suck to configure.

>
> --
> Before enlightenment - chop wood, draw water.
> After  enlightenment - chop wood, draw water.
>
> Marko Cupać
> https://www.mimar.rs/



Re: PF: can't make queueing and priority work as expected

2016-01-11 Thread David Gwynne
> On 11 Jan 2016, at 22:43, Daniel Melameth  wrote:
>
> On Sun, Jan 10, 2016 at 7:58 AM, Marko Cupać  wrote:
>> On Sat, 9 Jan 2016 11:11:27 -0700
>> Daniel Melameth  wrote:
>>> You NEED to set a max on your ROOT queues.
>> I came to this conclusion as well. But not only on root queues. For
>> example, when max is set on root queue but only bandwidth on child
>> queues, no shaping takes place...
>
> This works for me.
>
>> Or, to cut the long story short, if someone can paste queue definition
>> which accomplishes 'give both queues max bandwidth, but throttle
>> traffic from first queue when traffic from the second one arrives', I
>> will be more than happy to quit bothering misc@ list readers with my
>> rants and observations.
>
> I would expect this to be possible with prio alone, but I've never
> been able to get it to work.  Perhaps I'm misunderstanding how prio
> works.

prio is basically an array of lists of packets to be transmitted. high
priority packets go on a different list to low priority packets.

the problem is the way packets go on and off these lists. basically as soon as
a packet is queued on one of these lists for transmission, we call the driver
immediately to send it. generally as soon as a packet is queued on the
interface, it immediately gets dequeued by the driver and transmitted on the
hardware.

it is only when you build up a backlog of packets that priq can come into
effect. the only way you can build up a backlog of packets is if your hardware
is slower at transmitting packets than the thing that generates these packets
to send.

in your case you're probably getting packets from a relatively slow internet
connection and transmitting them on a high speed local network. the transmit
hardware is almost certainly going to be faster than your source of packets,
so you'll never build up a queue of backlogged packets, so prio is effectively
a nop.

dlg



Re: authentication infra structure

2015-12-10 Thread David Gwynne
> On 10 Dec 2015, at 5:25 AM, Friedrich Locke 
wrote:
>
> If you had about 10k users and 5k machine how would you manage
> authenticating issues? Keep in mind that this is a very heterogenous
> environment with ldap, ftp, smtp, pop3, traditional unix boxes etc 

we use ypldapd talking to our directory to make users appear on the box. nfs
for homedirs. at the moment we're using krb for auth, but im looking to change
that soon.

dlg



Re: Chelsio T4 10g adapters support ?

2015-12-09 Thread David Gwynne
> On 10 Dec 2015, at 12:28, Brendan Horan  wrote:
>
> Hi,
>
> I am looking at building a system running OpenBSD to deal with 10g
networks.
>
> It would seem there is good support for Intel cards via the "ix" driver.
> However I was looking at Chelsio cards.
> It seems the "che" driver only supports T3 series and the PE9000 cards.
>
> However the T3 series is PCIe 1.1,
> not exactly useful on a dual port 10gbe card.
> Thus I was looking at the T4 series cards.
>
> Would there be much needed to get one of them working on OpenBSD ?
> If the answer to that is "no clue",
> would the card make a good donation to someone at OpenBSD?
> FreeBSD has support for T4 cards if that helps.
>
> I am still unsure if I want this card or an Intel card at this point.
>
> Thanks for your time

you want an ix(4) for now.

there's a few 10g chips we dont have support for yet, but developer time is
more of a constraint than lack of hardware at the moment.

dlg



Re: em(4) watchdog timeouts

2015-11-15 Thread David Gwynne
On Fri, Nov 13, 2015 at 10:18:51AM -0500, Sonic wrote:
> On Wed, Nov 11, 2015 at 9:20 AM, Gregor Best  wrote:
> > I've done some further testing and I think I've narrowed it down to the
> > "Unlocking em(4) a bit further"-patch [0].

could you try this? its not written with the wdog stuff in mind,
but it does touch that stuff so it might help.

Index: if_em.c
===
RCS file: /cvs/src/sys/dev/pci/if_em.c,v
retrieving revision 1.310
diff -u -p -r1.310 if_em.c
--- if_em.c 29 Oct 2015 03:19:42 -  1.310
+++ if_em.c 15 Nov 2015 14:01:39 -
@@ -605,16 +605,20 @@ em_start(struct ifnet *ifp)
}
 
for (;;) {
-   IFQ_POLL(>if_snd, m_head);
-   if (m_head == NULL)
-   break;
-
-   if (em_encap(sc, m_head)) {
+   if (sc->num_tx_desc_avail < EM_MAX_SCATTER + 2) {
ifp->if_flags |= IFF_OACTIVE;
break;
}
 
IFQ_DEQUEUE(>if_snd, m_head);
+   if (m_head == NULL)
+   break;
+
+   if (em_encap(sc, m_head)) {
+   m_freem(m_head);
+   ifp->if_oerrors++;
+   continue;
+   }
 
 #if NBPFILTER > 0
/* Send a copy of the frame to the BPF listener */
@@ -622,9 +626,6 @@ em_start(struct ifnet *ifp)
bpf_mtap_ether(ifp->if_bpf, m_head, BPF_DIRECTION_OUT);
 #endif
 
-   /* Set timeout in case hardware has problems transmitting */
-   ifp->if_timer = EM_TX_TIMEOUT;
-
post = 1;
}
 
@@ -637,8 +638,11 @@ em_start(struct ifnet *ifp)
 * this tells the E1000 that this frame is
 * available to transmit.
 */
-   if (post)
+   if (post) {
E1000_WRITE_REG(>hw, TDT, sc->next_avail_tx_desc);
+
+   ifp->if_timer = EM_TX_TIMEOUT;
+   }
}
 }
 
@@ -1104,12 +1108,6 @@ em_encap(struct em_softc *sc, struct mbu
struct em_buffer   *tx_buffer, *tx_buffer_mapped;
struct em_tx_desc *current_tx_desc = NULL;
 
-   /* Check that we have least the minimal number of TX descriptors. */
-   if (sc->num_tx_desc_avail <= EM_TX_OP_THRESHOLD) {
-   sc->no_tx_desc_avail1++;
-   return (ENOBUFS);
-   }
-
if (sc->hw.mac_type == em_82547) {
bus_dmamap_sync(sc->txdma.dma_tag, sc->txdma.dma_map, 0,
sc->txdma.dma_map->dm_mapsize,
@@ -1147,9 +1145,6 @@ em_encap(struct em_softc *sc, struct mbu
 
EM_KASSERT(map->dm_nsegs!= 0, ("em_encap: empty packet"));
 
-   if (map->dm_nsegs > sc->num_tx_desc_avail - 2)
-   goto fail;
-
if (sc->hw.mac_type >= em_82543 && sc->hw.mac_type != em_82575 &&
sc->hw.mac_type != em_82580 && sc->hw.mac_type != em_i210 &&
sc->hw.mac_type != em_i350)
@@ -1168,9 +1163,9 @@ em_encap(struct em_softc *sc, struct mbu
 * Check the Address and Length combination and
 * split the data accordingly
 */
-   array_elements = 
em_fill_descriptors(map->dm_segs[j].ds_addr,
-
map->dm_segs[j].ds_len,
-_array);
+   array_elements = em_fill_descriptors(
+   map->dm_segs[j].ds_addr,
+   map->dm_segs[j].ds_len, _array);
for (counter = 0; counter < array_elements; counter++) {
if (txd_used == sc->num_tx_desc_avail) {
sc->next_avail_tx_desc = txd_saved;
@@ -2481,8 +2476,7 @@ em_txeof(struct em_softc *sc)
 * If we have enough room, clear IFF_OACTIVE to tell the stack
 * that it is OK to send packets.
 */
-   if (ISSET(ifp->if_flags, IFF_OACTIVE) &&
-   num_avail > EM_TX_OP_THRESHOLD) {
+   if (num_avail > 0 && ISSET(ifp->if_flags, IFF_OACTIVE)) {
KERNEL_LOCK();
CLR(ifp->if_flags, IFF_OACTIVE);
em_start(ifp);



Re: Dell S300 controller

2015-05-08 Thread David Gwynne
 On 8 May 2015, at 12:41 pm, Jim Giannoules j...@devio.us wrote:
 
 On Tue, May 05, 2015 at 06:54:37PM +, Stuart Henderson wrote:
 On 2015-05-05, Jack Peirce jpei...@sourcecode.com wrote:
 On Mon, May 04, 2015 at 08:22:28PM -0400, Steve Shockley wrote:
 Does anyone know if the Dell PERC S300 controller will work under 
 OpenBSD as a non-RAID SAS HBA?  It has an LSI SAS 1068e, but I didn't 
 know if they did something to make it not work as an HBA.  Thanks.
 
 I don't believe the controller will automatically export unconfigured
 drives as single drive units. LSI makes 2 different versions of 
 firmware for the unbranded controllers, IR mode for RAID and IT mode 
 for HBA, but it's not possible/easy to flash them to the Dell branded 
 controllers.
 
 Create RAID0 single drive units on each disk and it should export.
 
 
 
 AFAIK the S300 doesn't work at all on OpenBSD (or Linux). It was only ever
 meant to work with Windows.
 
 
 The Dell PERC S300 is a SWRAID product. It is correct that the hardware is 
 an LSI1068e, but programmed with modified PCI IDs (all four a diffferent: 
 vendor, device, sub-vendor, sub-device). The expansion ROM and drivers are 
 from DotHill systems and are looking for these update IDs. The controller 
 itself is running the IR/IT firmware with IR soft-disabled. To turn the 
 controller back into a normal LSI1068e you would need to update the expansion 
 ROM and the PCI IDs.
 
 As a science experiment you might be able to modify mpi(4) to look for the 
 S300 IDs, but that would be an OS runtime only fix.
 

im pretty sure the s300 is actually the ahci ports coming off the motherboard. 
if its in ahci mode it should Just Work(tm) as a sata controller. not sas, 
sorry.

the h200 was the last straight sas hba you could get in a dell. if you want sas 
ports in their more recent machines you can configure physical disks on a h330 
or h730, both of which are mfii controllers.

dlg



Re: Not Detecting Broadcom NetXtreme II 10GBase-T adapter

2015-03-10 Thread David Gwynne
i havent written a driver for it yet.

 On 10 Mar 2015, at 10:07 pm, Ninad Shaha ninadsh...@iitb.ac.in wrote:
 
 Dear All,
 
 I have installed OpenBSD 5.6 on IBM X3650 M4 server. This server 
 contains 2 numbers of Broadcom NetXtreme II BCM57712 10GBase-T dual port 
 adapter. This adapter is not visible or detected by OpenBSD. It just 
 shows not configured in dmesg.
 
 Following is the dmesg output from above server. Please guide me for the 
 same as I am new to BSD.
 
 OpenBSD 5.6 (GENERIC.MP) #333: Fri Aug  8 00:20:21 MDT 2014
 dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
 RTC BIOS diagnostic error 80clock_battery
 real mem = 34315374592 (32725MB)
 avail mem = 33393123328 (31846MB)
 mpath0 at root
 scsibus0 at mpath0: 256 targets
 mainbus0 at root
 bios0 at mainbus0: SMBIOS rev. 2.7 @ 0x7e7be000 (82 entries)
 bios0: vendor IBM version -[VVE142AUS-1.70]- date 06/04/2014
 bios0: IBM 00Y7683
 acpi0 at bios0: rev 2
 acpi0: sleep states S0 S5
 acpi0: tables DSDT FACP TCPA ERST HEST HPET APIC MCFG OEM0 OEM1 SLIT 
 SRAT SLIC SSDT SSDT SSDT SSDT SSDT SSDT SSDT DMAR
 acpi0: wakeup devices MRP1(S4) DCC0(S4) ENET(S4) MRP3(S4) MRP5(S4) 
 EHC2(S5) PEX0(S5) PEX7(S5) EHC1(S5) IP2P(S3) MRPB(S4) MRPC(S4) MRPD(S4) 
 MRPM(S4) MRPE(S4) MRPF(S4) [...]
 acpitimer0 at acpi0: 3579545 Hz, 24 bits
 acpihpet0 at acpi0: 14318179 Hz
 acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
 cpu0 at mainbus0: apid 0 (boot processor)
 cpu0: Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz, 3000.46 MHz
 cpu0: 
 FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PC
 ID,DCA,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,AES,XSAVE,AVX,NXE,PAGE1GB,LONG,LAHF,PERF,ITSC
 cpu0: 256KB 64b/line 8-way L2 cache
 cpu0: smt 0, core 0, package 0
 mtrr: Pentium Pro MTRR support, 10 var ranges, 88 fixed ranges
 cpu0: apic clock running at 99MHz
 cpu0: mwait min=64, max=64, C-substates=0.2.1.1.2, IBE
 cpu1 at mainbus0: apid 2 (application processor)
 cpu1: Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz, 3000.00 MHz
 cpu1: 
 FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PC
 ID,DCA,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,AES,XSAVE,AVX,NXE,PAGE1GB,LONG,LAHF,PERF,ITSC
 cpu1: 256KB 64b/line 8-way L2 cache
 cpu1: smt 0, core 1, package 0
 cpu2 at mainbus0: apid 4 (application processor)
 cpu2: Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz, 3000.00 MHz
 cpu2: 
 FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PC
 ID,DCA,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,AES,XSAVE,AVX,NXE,PAGE1GB,LONG,LAHF,PERF,ITSC
 cpu2: 256KB 64b/line 8-way L2 cache
 cpu2: smt 0, core 2, package 0
 cpu3 at mainbus0: apid 6 (application processor)
 cpu3: Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz, 3000.00 MHz
 cpu3: 
 FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PC
 ID,DCA,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,AES,XSAVE,AVX,NXE,PAGE1GB,LONG,LAHF,PERF,ITSC
 cpu3: 256KB 64b/line 8-way L2 cache
 cpu3: smt 0, core 3, package 0
 cpu4 at mainbus0: apid 8 (application processor)
 cpu4: Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz, 3000.00 MHz
 cpu4: 
 FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PC
 ID,DCA,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,AES,XSAVE,AVX,NXE,PAGE1GB,LONG,LAHF,PERF,ITSC
 cpu4: 256KB 64b/line 8-way L2 cache
 cpu4: smt 0, core 4, package 0
 cpu5 at mainbus0: apid 10 (application processor)
 cpu5: Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz, 3000.00 MHz
 cpu5: 
 FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PC
 ID,DCA,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,AES,XSAVE,AVX,NXE,PAGE1GB,LONG,LAHF,PERF,ITSC
 cpu5: 256KB 64b/line 8-way L2 cache
 cpu5: smt 0, core 5, package 0
 cpu6 at mainbus0: apid 12 (application processor)
 cpu6: Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz, 3000.00 MHz
 cpu6: 
 FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PC
 ID,DCA,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,AES,XSAVE,AVX,NXE,PAGE1GB,LONG,LAHF,PERF,ITSC
 cpu6: 256KB 64b/line 8-way L2 cache
 cpu6: smt 0, core 6, package 0
 cpu7 at mainbus0: apid 14 (application processor)
 cpu7: Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz, 3000.00 MHz
 cpu7: 
 

Re: Emulate apache mod_rewrite feature with new httpd

2015-02-22 Thread David Gwynne
 On 22 Feb 2015, at 20:23, Romain FABBRI romain.fab...@alienconsulting.net 
 wrote:
 
 I've managed to configure the new httpd server to use as a replacement for 
 apache. (With is really great.  Thanks to Reyk !)
 
 I'm strugling to make my drupal site work, because of the clean url module.
 
 I used to have the following apache mod_rewrite configuration :
 RewriteEngine on
 RewriteBase /
 RewriteCond %{REQUEST_FILENAME} !-f
 RewriteCond %{REQUEST_FILENAME} !-d
 RewriteRule ^(.*)$ /index.php?q=$1 [L,QSA]
 
 Basically the rule means that if the file or folder is not found then the 
 request is rewrited to /index.php?q=request
 For exemple if/user doesn't exist modify url to /index.php?q=user
 
 I've looked into man page for httpd and I've seen that the block return 
 statement might be of use to emulate this need. but I haven't found many info 
 on the subject.
 
 Has someone found a way to make that with the new httpd server ?
 
 PS : I'm running from snapshot (5.7 GENERIC#716 i386)
 
 Romain

i havent tried drupal behind httpd yet, but if i did i would unconditionally 
route requests into the drupal controller (index.php), and use a cdn module to 
have drupal generate urls to static assets (ie, the css/js/image files on disk) 
against a separate domain or url prefix. or you could write a simple module 
that takes advantage of hook_file_url_alter. that has greatly simplified our 
configs in the frontend web servers in front of our drupal poop.



Re: YP Alternative

2015-01-04 Thread David Gwynne
 On 4 Jan 2015, at 5:32 pm, Brian Empson br...@teamhandbanana.com wrote:
 
 This sounds interesting. What would you replace krb5 with, if you don't mind 
 me asking? I was contemplating krb5, but the setup and such is a pain for me 
 (because I am not familiar with it). I'll probably wind up rolling something 
 custom with LDAP and YP mappings thrown in.

i dunno. ideally i would just do basic auth over https against something that 
just returns 200 or 403. bsdauth on openbsd means i could probably implement 
that with a crappy script. linux probably has a crazy pam module i could use to 
do auth with http, but the solarish things i run almost certainly dont.

however, linux and solaris still support krb5 auth out of the box, so its only 
a problem i really have to solve on openbsd. or use ldap auth.

 
 On 1/4/2015 2:26 AM, David Gwynne wrote:
 On 2 Jan 2015, at 9:52 pm, Brian Empson br...@teamhandbanana.com wrote:
 
 I'm looking into a way to sync up group and user information across a 
 network of OpenBSD machines. I like YP, except that I don't need the 
 password hashes transferred across the network. I like that it's built 
 right into the base install, are there better ways to handle synchronizing 
 login details across multiple machines that is built into the base install? 
 Preferably written by the OpenBSD team, too?
 while not directly answering your question, i can say openbsd can do this 
 kind of stuff without yp on the wire.
 
 at work i use ypldap to get user/group information from active directory. we 
 populate the rfc2307 attributes on our users and groups to make them useful 
 on unix systems. we use the single directory as a name service backend for 
 openbsd, solaris, linux, and windows (of course).
 
 we're still using krb5 for password authentication. i really have to fix 
 that.
 
 we've also augmented the AD schema to store users ssh keys in the directory 
 too. sshd gets access to them via AuthorizedKeysCommand and a perl script. 
 this allows ssh key based single sign on across all our unixish systems, 
 even if their home directories are not available on the system. this is 
 useful for providing services over ssh. an example of such a service we 
 provide is svn and git on a dedicated server. all our users are on the 
 system via ypldap, and they can auth using their own username and either a 
 password or ssh key.
 
 dlg



Re: YP Alternative

2015-01-04 Thread David Gwynne
 On 5 Jan 2015, at 06:14, Jiri B ji...@devio.us wrote:
 
 On Sun, Jan 04, 2015 at 06:40:09PM +1000, David Gwynne wrote:
 i dunno. ideally i would just do basic auth over https against something 
 that just returns 200 or 403. bsdauth on openbsd means i could probably 
 implement that with a crappy script. linux probably has a crazy pam module i 
 could use to do auth with http, but the solarish things i run almost 
 certainly dont.
 
 Did you mean this as SSO solution?

which sso are you talking about? if you mean same sign-on, then yes. my users 
only have to know a single username and password on our infrastructure.



Re: YP Alternative

2015-01-03 Thread David Gwynne
 On 2 Jan 2015, at 9:52 pm, Brian Empson br...@teamhandbanana.com wrote:
 
 I'm looking into a way to sync up group and user information across a network 
 of OpenBSD machines. I like YP, except that I don't need the password hashes 
 transferred across the network. I like that it's built right into the base 
 install, are there better ways to handle synchronizing login details across 
 multiple machines that is built into the base install? Preferably written by 
 the OpenBSD team, too?

while not directly answering your question, i can say openbsd can do this kind 
of stuff without yp on the wire.

at work i use ypldap to get user/group information from active directory. we 
populate the rfc2307 attributes on our users and groups to make them useful on 
unix systems. we use the single directory as a name service backend for 
openbsd, solaris, linux, and windows (of course).

we're still using krb5 for password authentication. i really have to fix that.

we've also augmented the AD schema to store users ssh keys in the directory 
too. sshd gets access to them via AuthorizedKeysCommand and a perl script. this 
allows ssh key based single sign on across all our unixish systems, even if 
their home directories are not available on the system. this is useful for 
providing services over ssh. an example of such a service we provide is svn and 
git on a dedicated server. all our users are on the system via ypldap, and they 
can auth using their own username and either a password or ssh key.

dlg



Re: ixgbe_tx_ctx_setup crash

2014-12-28 Thread David Gwynne
 On 27 Dec 2014, at 6:09 pm, Kapetanakis Giannis bil...@edu.physics.uoc.gr 
 wrote:
 
 On 27/12/14 10:05, Kapetanakis Giannis wrote:
 On 26/12/14 12:23, Kapetanakis Giannis wrote:
 Hi,
 
 Any ideas on this? I'm getting at least one panic every day.
 
 G
 
 On 24/12/14 06:13, Kapetanakis Giannis wrote:
 Today I've installed a 10Gb adapter and upgraded to latest snapshot.
 I've had a crash...
 
 Machine is a Fujitsu RX300 S6 and the adapter is an Intel X520 SR1
 
 G
 
 ddb{0} trace
 ixgbe_tx_ctx_setup(d4164980,d919df00,f55a0e5c,f55a0e60,4) at 
 ixgbe_tx_ctx_setup
 +0x11a
 ixgbe_encap(d4164980,d919df00,0,1000,a) at ixgbe_encap+0x16a
 ixgbe_start(d4376030,d424f2c0,1,f55a0ed4,d43ac320) at ixgbe_start+0xa7
 nettxintr(0,50,0,f55a0f08,d055e282) at nettxintr+0x47
 softintr_dispatch(1) at softintr_dispatch+0x5a
 Xsoftnet() at Xsoftnet+0x17
 --- interrupt ---
 cpu_idle_cycle(d0c54a00) at cpu_idle_cycle+0xf
 Bad frame pointer: 0xd0d1ee58
 
 I'm still getting the crash at the same point.
 I've followed http://www.benzedrine.cx/crashreport.html
 and I have managed to trace it to line 2114 of revision 1.113
 
  2107  /* Set the ether header length */
  2108  vlan_macip_lens |= ehdrlen  IXGBE_ADVTXD_MACLEN_SHIFT;
  2109
  2110  switch (etype) {
  2111  case ETHERTYPE_IP:
  2112  ip = (struct ip *)(mp-m_data + ehdrlen);
  2113  ip_hlen = ip-ip_hl  2;
 2114  ipproto = ip-ip_p;
  2115  type_tucmd_mlhl |= IXGBE_ADVTXD_TUCMD_IPV4;
  2116  break;
 
 I've also tried to revert back to older revisions of if_ix.c (until 1.108) 
 with no luck.
 I will continue a bit more to see if anything changes.
 
 regards,
 
 Giannis
 
 Just to add that I'm running vlan 802.1q trunk on this interface.

thats pretty cool.

what kind of traffic are you sending on this interface? is it forwarded traffic 
or locally generated?

cheers,
dlg



Re: ixgbe_tx_ctx_setup crash

2014-12-28 Thread David Gwynne
On Sun, Dec 28, 2014 at 08:00:57PM +1000, David Gwynne wrote:
 
  On 27 Dec 2014, at 6:09 pm, Kapetanakis Giannis bil...@edu.physics.uoc.gr 
  wrote:
  
  On 27/12/14 10:05, Kapetanakis Giannis wrote:
  On 26/12/14 12:23, Kapetanakis Giannis wrote:
  Hi,
  
  Any ideas on this? I'm getting at least one panic every day.
  
  G
  
  On 24/12/14 06:13, Kapetanakis Giannis wrote:
  Today I've installed a 10Gb adapter and upgraded to latest snapshot.
  I've had a crash...
  
  Machine is a Fujitsu RX300 S6 and the adapter is an Intel X520 SR1
  
  G
  
  ddb{0} trace
  ixgbe_tx_ctx_setup(d4164980,d919df00,f55a0e5c,f55a0e60,4) at 
  ixgbe_tx_ctx_setup
  +0x11a
  ixgbe_encap(d4164980,d919df00,0,1000,a) at ixgbe_encap+0x16a
  ixgbe_start(d4376030,d424f2c0,1,f55a0ed4,d43ac320) at ixgbe_start+0xa7
  nettxintr(0,50,0,f55a0f08,d055e282) at nettxintr+0x47
  softintr_dispatch(1) at softintr_dispatch+0x5a
  Xsoftnet() at Xsoftnet+0x17
  --- interrupt ---
  cpu_idle_cycle(d0c54a00) at cpu_idle_cycle+0xf
  Bad frame pointer: 0xd0d1ee58
  
  I'm still getting the crash at the same point.
  I've followed http://www.benzedrine.cx/crashreport.html
  and I have managed to trace it to line 2114 of revision 1.113
  
   2107  /* Set the ether header length */
   2108  vlan_macip_lens |= ehdrlen  IXGBE_ADVTXD_MACLEN_SHIFT;
   2109
   2110  switch (etype) {
   2111  case ETHERTYPE_IP:
   2112  ip = (struct ip *)(mp-m_data + ehdrlen);
   2113  ip_hlen = ip-ip_hl  2;
  2114  ipproto = ip-ip_p;
   2115  type_tucmd_mlhl |= IXGBE_ADVTXD_TUCMD_IPV4;
   2116  break;
  
  I've also tried to revert back to older revisions of if_ix.c (until 1.108) 
  with no luck.
  I will continue a bit more to see if anything changes.
  
  regards,
  
  Giannis
  
  Just to add that I'm running vlan 802.1q trunk on this interface.
 
 thats pretty cool.
 
 what kind of traffic are you sending on this interface? is it forwarded 
 traffic or locally generated?
 
 cheers,
 dlg

can you try this diff? it should apply with a little fuzz.

Index: if_ix.c
===
RCS file: /cvs/src/sys/dev/pci/if_ix.c,v
retrieving revision 1.113
diff -u -p -r1.113 if_ix.c
--- if_ix.c 22 Dec 2014 02:28:52 -  1.113
+++ if_ix.c 28 Dec 2014 11:53:00 -
@@ -2035,14 +2049,13 @@ ixgbe_tx_ctx_setup(struct tx_ring *txr, 
struct ix_softc *sc = txr-sc;
struct ixgbe_adv_tx_context_desc *TXD;
struct ixgbe_tx_buf *tx_buffer;
+   struct ether_header eh;
 #if NVLAN  0
-   struct ether_vlan_header *eh;
-#else
-   struct ether_header *eh;
+   struct ether_vlan_header evh;
 #endif
-   struct ip *ip;
+   struct ip ip;
 #ifdef notyet
-   struct ip6_hdr *ip6;
+   struct ip6_hdr ip6;
 #endif
uint32_t vlan_macip_lens = 0, type_tucmd_mlhl = 0;
int ehdrlen, ip_hlen = 0;
@@ -2089,19 +2102,21 @@ ixgbe_tx_ctx_setup(struct tx_ring *txr, 
 * Jump over vlan headers if already present,
 * helpful for QinQ too.
 */
+   if (mp-m_pkthdr.len  sizeof(eh))
+   return (1);
+
+   m_copydata(mp, 0, sizeof(eh), (caddr_t)eh);
+   etype = ntohs(eh.ether_type);
+   ehdrlen = ETHER_HDR_LEN;
 #if NVLAN  0
-   eh = mtod(mp, struct ether_vlan_header *);
-   if (eh-evl_encap_proto == htons(ETHERTYPE_VLAN)) {
-   etype = ntohs(eh-evl_proto);
-   ehdrlen = ETHER_HDR_LEN + ETHER_VLAN_ENCAP_LEN;
-   } else {
-   etype = ntohs(eh-evl_encap_proto);
-   ehdrlen = ETHER_HDR_LEN;
+   if (etype == htons(ETHERTYPE_VLAN)) {
+   if (mp-m_pkthdr.len  sizeof(evh))
+   return (1);
+
+   m_copydata(mp, 0, sizeof(evh), (caddr_t)evh);
+   etype = ntohs(evh.evl_proto);
+   ehdrlen = sizeof(evh);
}
-#else
-   eh = mtod(mp, struct ether_header *);
-   etype = ntohs(eh-ether_type);
-   ehdrlen = ETHER_HDR_LEN;
 #endif
 
/* Set the ether header length */
@@ -2109,17 +2124,23 @@ ixgbe_tx_ctx_setup(struct tx_ring *txr, 
 
switch (etype) {
case ETHERTYPE_IP:
-   ip = (struct ip *)(mp-m_data + ehdrlen);
-   ip_hlen = ip-ip_hl  2;
-   ipproto = ip-ip_p;
+   if (mp-m_pkthdr.len  ehdrlen + sizeof(ip))
+   return (1);
+
+   m_copydata(mp, ehdrlen, sizeof(ip), (caddr_t)ip);
+   ip_hlen = ip.ip_hl  2;
+   ipproto = ip.ip_p;
type_tucmd_mlhl |= IXGBE_ADVTXD_TUCMD_IPV4;
break;
 #ifdef notyet
case ETHERTYPE_IPV6:
-   ip6 = (struct ip6_hdr *)(mp-m_data + ehdrlen);
-   ip_hlen = sizeof(struct ip6_hdr);
+   if (mp-m_pkthdr.len  ehdrlen + sizeof(ip6))
+   return (1

Re: Dell R630 high interrupts on acpi0

2014-12-15 Thread David Gwynne
 On 16 Dec 2014, at 15:16, Jonathan Matthew jonat...@d14n.org wrote:
 
 On Sun, Dec 14, 2014 at 06:22:37PM +0100, Hrvoje Popovski wrote:
 Hi all,
 
 I have got two new Dell R630 and have current on them from Sun Dec
 14 15:07:17. Installation went great and very fast.
 The problem is that I see around 11k interrupts on acpi0. First I
 thought that problem is similar to this thread
 http://marc.info/?l=openbsd-miscm=140551906923931w=2
 
 But if in dell bios system profile settings is set to performance or
 to DAPC there are always interrupts on acpi0.
 In links bellow you can find acpidump and dmesg from performance and
 DAPC settings in dell bios.
 
 We just got some r630s too, so I spent some time last week figuring out what's
 going on here.  Something in the AML wants to talk to the intel MEI device.
 Normally this works, but on the new generation of dell machines (we've seen it
 on r630s and r730s), it's been moved outside the pci memory range we currently
 allow on amd64.  You can see this in your dmesgs:
 
 0:22:0: mem address conflict 0x3303000/0x10
 0:22:1: mem address conflict 0x3302000/0x10
 
 The interrupt will keep triggering until it manages to talk to the device,
 which will never happen.

we've also experienced this on r720s configured in Performance mode in the BIOS 
too. others have hit this on r620s as well (look for Dell R620 high ACPI 
Interrupt rate on misc@). the diff below fixes them too.

dlg

 
 kettenis@ says we can get the pci memory range information we need to deal 
 with
 this from acpi.  Until that happens, expanding the allowed pci memory range
 makes things work properly.
 
 ok?
 
 
 Index: pci_machdep.c
 ===
 RCS file: /cvs/src/sys/arch/amd64/pci/pci_machdep.c,v
 retrieving revision 1.59
 diff -u -p -u -p -r1.59 pci_machdep.c
 --- pci_machdep.c 19 Apr 2014 11:53:42 -  1.59
 +++ pci_machdep.c 16 Dec 2014 04:21:53 -
 @@ -622,13 +622,17 @@ pci_init_extents(void)
* here.  As long as vendors continue to support
* 32-bit operating systems, we should never see BARs
* outside that region.
 +  *
 +  * Dell 13G servers have important devices outside the
 +  * 36-bit address space.  Until we can extract the address
 +  * ranges from acpi, expand the allowed range to suit.
*/
   pcimem_ex = extent_create(pcimem, 0, 0xUL,
   M_DEVBUF, NULL, 0, EX_NOWAIT);
   if (pcimem_ex == NULL)
   return;
 - extent_alloc_region(pcimem_ex, 0x10UL,
 - 0xfff0UL, EX_NOWAIT);
 + extent_alloc_region(pcimem_ex, 0x400UL,
 + 0xfc00UL, EX_NOWAIT);
 
   for (bmp = bios_memmap; bmp-type != BIOS_MAP_END; bmp++) {
   /*



Re: bridge + vlan broke after 5.5 5.6 upgrade

2014-11-03 Thread David Gwynne
 On 4 Nov 2014, at 06:41, Pieter Verberne pieterverbe...@xs4all.nl wrote:
 
 On 2014-11-02 13:51, Jorge Schrauwen wrote:
 Hey All,
 TL;DR: traffic leaving a bridge over a vlan does
 not get tagged but leaves untagged after upgrade.
 Is this by design?
 Looks exactly like my problem. Running 5.6 release.

bridge(4) puts frames on the wire by calling the outgoing interfaces start 
routine, which in this case is vlan_start() because you're bridging vlan(4) 
interfaces.

mpi@ and weerd@ correctly identified the diff where henning@ changed 
vlan_start(). he assumed that ether_output is always called before vlan_start, 
and moved the tagging code into ether_output to make injecting the vlan tag 
more streamlined.

bridge obviously breaks this assumption cos it just shoves the packet into 
vlan_start() which then just shoves the packet onto the parent interface.

i have a massive headache and sleep deficit right now so im not going to 
suggest a way to fix this.

dlg



Re: Making tftp download large files from tftpd

2014-10-21 Thread David Gwynne
your diff got stripped when sending to this list. i did a fix which has now 
been committed to the tree as src/usr.bin/tftp/tftp.c r1.24.

thanks for the report :)

dlg

 On 21 Oct 2014, at 10:28, Justin Mayes jma...@careered.com wrote:
 
 I could. My original problem was with cisco rommon tftpdnld command as client 
 failing talking to tftpd. I just notice the tftp client problem while testing 
 locally. After this I intend to go back and make tftpd work with whatever 
 cisco client is doing. Since that’s a two byte field in the rfc there is no 
 way I know of that tftpd or any other server can get more than 65536 in there 
 so all they can do is rollover. The only thing I can think is maybe cisco 
 client starts at 1 rather than 0. A tcpdump will tell me in a little while. 
 This is more of a learning experience for me. I want to go through motions of 
 getting source, debugging some issue with gdb, updating the code, build and 
 all that. I've done that many times in windows world but not in any unix like 
 Oses. So far the exercise is a success in that I learned a ton and if that 
 diff was worth anything to anyone, even better. Thanks for the tip tho James, 
 its good advice.
 
 J
 -Original Message-
 From: James A. Peltier [mailto:jpelt...@sfu.ca] 
 Sent: Monday, October 20, 2014 5:34 PM
 To: Justin Mayes
 Cc: misc@openbsd.org
 Subject: Re: Making tftp download large files from tftpd
 
 - Original Message -
 | I will spare you all the backstory but I found that tftp could not 
 | download files over 32 mb by default from tftpd. I know you can pass 
 | blocksize to tftpd to handle much larger files but I was originally 
 | working with a client where this wasn't possible. Tftp protocol has 2 
 | bytes for block number which put a
 | 65535 limit on that. tftpd data doesn't care and will just roll that 
 | over back to 0 and keep sending data. Tftp client fails when there is 
 | block number roll over because it is tracking all the blocks with an 
 | int so ends up comparing its block counter which is now at 65536 to 
 | what comes off the network, 0 and quits. I updated the tftp client 
 | code to use same data type as the network side structs are using  - 
 | u_int16_t. Now tftp counter rolls along with server and can send file 
 | of any size with or without a blocksize change. I feel like this is 
 | mostly pointless but doesn't hurt anything. Will gladly provide the 
 | actuall diffs. I have to look into that process for openbsd but just 
 | wanted to check with the group first in case there was a reason an int 
 | was used that I do not understand.
 | 
 | J
 
 Or you could chainload iPXE to allow for the downloading of your file over 
 HTTP which is much faster than TFTP to begin with.  This is indeed what we do.
 
 --
 James A. Peltier
 IT Services - Research Computing Group
 Simon Fraser University - Burnaby Campus
 Phone   : 778-782-6573
 Fax : 778-782-3045
 E-Mail  : jpelt...@sfu.ca
 Website : http://www.sfu.ca/itservices
 Twitter : @sfu_rcg
 Powering Engagement Through Technology



Re: Periodic DNS resolution

2014-09-28 Thread David Gwynne
i run some scripts out of cron (and from rc.local on boot) to keep some tables 
in sync with DNS. the scripts are perl so i can use Net::DNS, but apart from 
that its been pretty robust and straightforward.

dlg

On 29 Sep 2014, at 11:44, Austin Gilbert austin.gilb...@gmail.com wrote:

 A problem that seems to come up over and over again with egress filtering 
 firewalls are sites that move IPs so the names need to be resolved 
 periodically and rules updated. 
 
 I recently migrated to using pf and was wondering if anyone had suggestions 
 for how they tackle this problem.
 
 I realize I could create cron jobs with scripts to do this, but was curious 
 if there was a better way.
 
 I'd eventually like to track changes and log them as well.
 
 Cheers,
 Austin



Re: 5.5 panic in ffs_blkfree

2014-08-25 Thread David Gwynne
you can go show panic in ddb if you werent watching the console for the panic 
string.

On 25 Aug 2014, at 4:39 am, Roger Hammerstein cheek...@live.com wrote:

 sorry, i could not get the panic string
 
 ddb{3} trace
 Debugger() at Debugger+0x5
 panic() at panic+0xee
 ffs_blkfree() at ffs_blkfree+0x717
 ffs_indirtrunc() at ffs_indirtrunc+0x2ac
 ffs_indirtrunc() at ffs_indirtrunc+0x28e
 ffs_truncate() at ffs_truncate+0xb45
 ufs_inactive() at ufs_inactive+0x109
 VOP_INACTIVE() at VOP_INACTIVE+0x28
 vput() at vput+0x3e
 ufs_rename() at ufs_rename+0xdb0
 VOP_RENAME() at VOP_RENAME+0x3b
 dorenameat() at dorenameat+0x249
 syscall() at syscall+0x24f
 --- syscall (number 128) ---
 end trace frame: 0x0, count: -13
 0xe66317e083a:
 ddb{3}
 
 
 dell  r310, dns server with isc-bind port rotatinglogs on /var,  no
 softupdates.
 
 
 
 
 
 OpenBSD  5.5 GENERIC.MP#315 amd64
 OpenBSD 5.5 (GENERIC.MP) #315: Wed Mar  5 09:37:46 MST 2014
dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
 real mem = 17153232896 (16358MB)
 avail mem = 16688005120 (15914MB)
 mainbus0 at root
 bios0 at mainbus0: SMBIOS rev. 2.6 @ 0xbf79c000 (66 entries)
 bios0: vendor Dell Inc. version 1.6.4 date 03/03/2011
 bios0: Dell Inc. PowerEdge R310



Re: Broadcom BCM5709 and BCM57711 driver features

2014-07-25 Thread David Gwynne
On 24 Jul 2014, at 19:37, def d...@fromru.com wrote:

 Hi!
 
 Currently using 5.5-stable and It seems (as per hwfeatures) that driver for 
 BCM 5709 (1GE dual port adapter) 
 doesnt support jumbo frames at all which is critical for activation mpls on 
 bnx.
 The card supports jumbo itself.
 Return invalid argument when trying to setup jumbo via ifconfig.
 is there an way to reach the high mtu values?

yes. from memory it just required the use of vi and make.

 Also, simple question - is the driver for Broadcom 10GE dual port adapter BCM 
 57711 availiable ?
 Cant see detected card in dmesg, but googled that someone seen that.

i started working on that and got distracted.

ill see if i can dig the bnx jumbo diff out. it wont make 5.6 but you can try 
it out if you want.



Re: 'newer' Qlogic HBA support on amd64

2014-05-16 Thread David Gwynne
hey pete,

could you try enabling the emc driver too?

cheers,
dlg

On 16 May 2014, at 7:47 pm, Pete Vickers peter.vick...@gmail.com wrote:

 Hi,
 
 Sorry for the delay. I finally upgraded the box (very quick and easy process 
 - nice ) and the HBA is now attached by the qle driver. However whilst it 
 'sees' the SAN disk behind it, it remain unable to talk to it.
 
 
 # uname -mrv 
 5.5 GENERIC.MP#315 amd64
 
 
 # dmesg | egrep -i qle|scsibus1
 qle0 at pci8 dev 0 function 0 QLogic ISP2432 rev 0x02: msi
 qle0: bad startup mboxes: 0 0
 qle0: firmware rev 4.0.20, attrs 0x2
 scsibus1 at qle0: 2048 targets, WWPN 50060b66644e, WWNN 50060b66644f
 sd1 at scsibus1 targ 130 lun 0: DGC, RAID 5, 0223 SCSI2 0/direct fixed 
 naa.600601601b662700d837603da8efe011
 sd2 at scsibus1 targ 131 lun 0: DGC, RAID 5, 0223 SCSI2 0/direct fixed 
 naa.600601601b662700d837603da8efe011
 
 
 sd1  sd2 : Are these duplicates due to redundant paths in SAN fabric ?
 
 
 # fdisk sd1 
 fdisk: DIOCGPDINFO: Input/output error
 fdisk: Can't get disk geometry, please use [-chs] to specify.
 
 
 
 # pcidump  -v 19:0:0
 19:0:0: QLogic ISP2432
0x: Vendor ID: 1077 Product ID: 2432
0x0004: Command: 0147 Status: 0010
0x0008: Class: 0c Subclass: 04 Interface: 00 Revision: 02
0x000c: BIST: 00 Header Type: 00 Latency Timer: 00 Cache Line Size: 10
0x0010: BAR io addr: 0x5000/0x0100
0x0014: BAR mem 64bit addr: 0xfdff/0x4000
0x001c: BAR empty ()
0x0020: BAR empty ()
0x0024: BAR empty ()
0x0028: Cardbus CIS: 
0x002c: Subsystem Vendor ID: 103c Product ID: 7040
0x0030: Expansion ROM Base Address: 
0x0038: 
0x003c: Interrupt Pin: 01 Line: 07 Min Gnt: 00 Max Lat: 00
0x0044: Capability 0x01: Power Management
0x004c: Capability 0x10: PCI Express
Link Speed: 2.5 / 2.5 GT/s Link Width: x4 / x4
0x0064: Capability 0x05: Message Signaled Interrupts (MSI)
0x0074: Capability 0x03: Vital Product Data (VPD)
0x007c: Capability 0x11: Extended Message Signaled Interrupts (MSI-X)
 
 e.g. http://filedownloads.qlogic.com/files/datasheets/32359/83432-580-00D.pdf
 
 
 
 (let me know if you want list spam with full dmesg).
 
 
 /Pete
 
 
 On 13. mars 2014, at 18:48, Ted Unangst t...@tedunangst.com wrote:
 
 On Thu, Mar 13, 2014 at 18:44, Pete Vickers wrote:
 Hi,
 I have a an amd64 server (HP DL360 G5), with an Qlogic FC HBA in it. It
 appears to be based on the ISP2400 series, and isp man page says the
 driver only supports up to the ISP2300 series. However the driver appears
 to try to attach the device irrespective (and fail). Does anyone know how
 different the 2400 series are, or if there is work in progress to support
 them ?
 
 In 5.5 and later, that's supported by the qle driver. The isp driver
 is being broken into parts (qlw, qla, qle) depending on generation.
 I'd try a snapshot. It should work better. And if it doesn't work,
 we'd like to know.



Re: uvm_fault on resume with athn(4)

2013-11-22 Thread David Gwynne
hey josh,

this should be fixed in src/sys/dev/pci/if_athn_pci.c r1.13.

sorry for the inconvenience, but thank you for the report, especially the
backtrace.

cheers,
dlg


On 23 November 2013 16:37, Josh Grosse j...@jggimi.homeip.net wrote:

 Summary:  with src/sys/dev/pci/if_athn_pci.c at revision 1.12,
 suspend/resume
 will produce a uvm_fault on resume.  I cannot reproduce the panic if I
 revert
 to revision 1.11.

 Of note: ddb(4) produces a brief traceback and a prompt but is inoperative.
 I am unable to get a dump if ddb.panic=0.  This traceback was transposed
 by hand.

 uvm_fault(0xd0b1a860, 0x0, 0, 1) - e
 kernel: page fault trap, code=0
 Stopped at  mtx_enter+0x6:  movl0x4(%ecx),%eax
 mtx_enter(10,8002,50,d1fea000,d1fc6a80) at mtx_enter+0x6
 task_add(0,d1fec088,f5bc7e1c,d03cfde5,d1fea000) at task_add+0x20
 athn_pci_activate(d1fea000,3,f5bc7e1c,d0597b3e,d1fc6a80) at
 athn_pci_activate+0x2b
 config_activate_children(d1fc6a80,3,f5bc7e4c,d059582c,0) at
 config_activate_children+0x45
 config_activate_children(d1fb6f00,3,4,100106,f5bc7e7c) at
 config_activate_children+0x45
 ppbactivate(d1fb6f00,3,f5bc7ebc,d0597b3e,d1e7b900) at ppbactivate+0x289
 config_activate_children(d1e7b900,3,0,3,0) at config_activate_children+0x45
 config_activate_children(d1f67000,3,0,c731,3) at
 config_activate_children+0x45
 acpi_sleep_state(d1e7a400,3,f5bc7f5c,d0ecb31a,d205e570) at
 acpi_sleep_state+0x2c3
 acpi_sleep_task(d1e7a400,3,d6efe91c,1,d1e7a400) at acpi_sleep_task+0x1a
 ddb{0}

 OpenBSD 5.4-current (GENERIC.MP) #141: Thu Nov 21 15:03:32 MST 2013
 dera...@i386.openbsd.org:/usr/src/sys/arch/i386/compile/GENERIC.MP
 cpu0: Intel(R) Atom(TM) CPU N270 @ 1.60GHz (GenuineIntel 686-class) 1.60
 GHz
 cpu0:
 FPU,V86,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,NXE,SSE3,DTES64,MWAIT,DS-CPL,EST,TM2,SSSE3,xTPR,PDCM,MOVBE,LAHF,PERF
 real mem  = 1064497152 (1015MB)
 avail mem = 1035247616 (987MB)
 mainbus0 at root
 bios0 at mainbus0: AT/286+ BIOS, date 04/18/11, BIOS32 rev. 0 @ 0xf0010,
 SMBIOS rev. 2.5 @ 0xf0720 (30 entries)
 bios0: vendor American Megatrends Inc. version 1601 date 04/18/2011
 bios0: ASUSTeK Computer INC. 1005HA
 acpi0 at bios0: rev 0
 acpi0: sleep states S0 S3 S4 S5
 acpi0: tables DSDT FACP APIC MCFG OEMB HPET SSDT
 acpi0: wakeup devices P0P2(S4) P0P1(S4) HDAC(S4) P0P4(S4) P0P8(S4)
 P0P5(S4) P0P7(S4) P0P9(S4) P0P6(S4)
 acpitimer0 at acpi0: 3579545 Hz, 24 bits
 acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
 cpu0 at mainbus0: apid 0 (boot processor)
 cpu0: apic clock running at 133MHz
 cpu0: mwait min=64, max=64, C-substates=0.2.2.0.2, IBE
 cpu1 at mainbus0: apid 1 (application processor)
 cpu1: Intel(R) Atom(TM) CPU N270 @ 1.60GHz (GenuineIntel 686-class) 1.60
 GHz
 cpu1:
 FPU,V86,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,NXE,SSE3,DTES64,MWAIT,DS-CPL,EST,TM2,SSSE3,xTPR,PDCM,MOVBE,LAHF,PERF
 ioapic0 at mainbus0: apid 2 pa 0xfec0, version 20, 24 pins
 ioapic0: misconfigured as apic 1, remapped to apid 2
 acpimcfg0 at acpi0 addr 0xe000, bus 0-63
 acpihpet0 at acpi0: 14318179 Hz
 acpiprt0 at acpi0: bus 0 (PCI0)
 acpiprt1 at acpi0: bus 2 (P0P5)
 acpiprt2 at acpi0: bus 1 (P0P7)
 acpiprt3 at acpi0: bus -1 (P0P6)
 acpiec0 at acpi0
 acpicpu0 at acpi0: C3, C2, C1, PSS
 acpicpu1 at acpi0: C3, C2, C1, PSS
 acpitz0 at acpi0: critical temperature is 88 degC
 acpibat0 at acpi0: BAT0 model 1005HA serial   type LION oem ASUS
 acpiac0 at acpi0: AC unit offline
 acpiasus0 at acpi0
 acpibtn0 at acpi0: LID_
 acpibtn1 at acpi0: SLPB
 acpibtn2 at acpi0: PWRB
 bios0: ROM list: 0xc/0xec00!
 cpu0: Enhanced SpeedStep 1600 MHz: speeds: 1600, 1333, 1067, 800 MHz
 pci0 at mainbus0 bus 0: configuration mode 1 (bios)
 pchb0 at pci0 dev 0 function 0 Intel 82945GME Host rev 0x03
 vga1 at pci0 dev 2 function 0 Intel 82945GME Video rev 0x03
 intagp0 at vga1
 agp0 at intagp0: aperture at 0xd000, size 0x1000
 inteldrm0 at vga1
 drm0 at inteldrm0
 inteldrm0: 1024x600
 wsdisplay0 at vga1 mux 1: console (std, vt100 emulation)
 wsdisplay0: screen 1-5 added (std, vt100 emulation)
 Intel 82945GM Video rev 0x03 at pci0 dev 2 function 1 not configured
 azalia0 at pci0 dev 27 function 0 Intel 82801GB HD Audio rev 0x02: msi
 azalia0: codecs: Realtek ALC269
 audio0 at azalia0
 ppb0 at pci0 dev 28 function 0 Intel 82801GB PCIE rev 0x02: apic 2 int 16
 pci1 at ppb0 bus 4
 ppb1 at pci0 dev 28 function 1 Intel 82801GB PCIE rev 0x02: apic 2 int 17
 pci2 at ppb1 bus 2
 athn0 at pci2 dev 0 function 0 Atheros AR9285 rev 0x01: apic 2 int 17
 athn0: AR9285 rev 2 (1T1R), ROM rev 13, address 00:25:d3:8a:f6:b4
 ppb2 at pci0 dev 28 function 3 Intel 82801GB PCIE rev 0x02: apic 2 int 19
 pci3 at ppb2 bus 1
 alc0 at pci3 dev 0 function 0 Attansic Technology L2C rev 0xc0: msi,
 address 90:e6:ba:37:cf:5e
 atphy0 at alc0 phy 0: F1 10/100/1000 PHY, rev. 11
 uhci0 at pci0 dev 29 function 0 Intel 82801GB USB rev 0x02: apic 2 

Re: 10G NIC recommendation

2013-08-14 Thread David Gwynne
im using myx(4). im biased though.

On 15/08/2013, at 9:09 AM, Diana Eichert deich...@wrench.com wrote:

 What I want to do.
 
 create a netflow collector using OpenBSD by looking at
 data fed from a tap
 
 I know which 10G NICs are supported by OpenBSD, what I'd
 like to hear is a recommendation on which one of the
 following to use.
 
 $ apropos 10G
 che, cheg (4) - Chelsio Communications 10Gb Ethernet device
 ix (4) - Intel 82598/82599/X540 PCI Express 10Gb Ethernet device
 ixgb (4) - Intel PRO/10GbE 10Gb Ethernet device
 myx (4) - Myricom Myri-10G PCI Express 10Gb Ethernet device
 oce (4) - Emulex OneConnect 10Gb Ethernet device
 tht, thtc (4) - Tehuti Networks 10Gb Ethernet device
 xge (4) - Neterion Xframe/Xframe II 10Gb Ethernet device
 
 I do have a few Myricom 10G-PCIE2-8B2-2S available already.
 However I have funds available to get something else if one
 of the other cards performs better.
 
 thanks
 
 diana
 
 
 Past hissy-fits are not a predictor of future hissy-fits.
 Nick Holland(06 Dec 2005)



Re: PF sync doesn't not work very well

2013-07-04 Thread David Gwynne
On 03/07/2013, at 10:11 PM, Mark Felder f...@feld.me wrote:

 On Wed, 03 Jul 2013 07:00:02 -0500, Loïc Blot loic.b...@unix-experience.fr 
 wrote:
 
 Hello,
 no carp is used at this time.
 
 pfsync needs to be used with carp... without it you're just playing 
 whack-a-mole with your session table.

no it doesnt. pfsync just does its best to keep the state table in sync, it in 
no way relies on carp to achieve that.

however, it does provide feedback to carp to try and avoid the box becoming a 
master and therefore taking traffic until it either thinks it has the whole 
state table from a peer or it is alone.



Re: PF sync doesn't not work very well

2013-07-04 Thread David Gwynne
On 03/07/2013, at 6:23 PM, Loïc Blot loic.b...@unix-experience.fr wrote:

 Okay, defer is now enabled on pfsync interface (sorry for my last idea,
 i haven't the man on me :) ).
 It seems the problem isn't resolved.
 The transfer starts but blocked at random time.

i have hit this too, despite being the person most responsible for trying to 
make pfsync work in active-active (hi bob!) configurations.

the problem is the tcp window tracking pf does, and how pfsync tries to cope 
with different routers being responsible for different halves of the packet 
flow. pfsync tries to merge each side of the tcp windows and tries to detect 
split paths to exchange updates more rapidly for those states. however, i find 
at some point the actual tcp windows move too fast for pfsync to keep up and 
all the real packets fall out of the window, causing the stalls you're talking 
about.

my solution is to try and prefer one half of the firewalls for all traffic, and 
use the second for handling failure. the split path handling works well enough 
that we can support traffic while we change roles (moving master to slave and 
slave to master) and the upstream hasnt figured it out yet via ospf.

sorry for the bad news. i might try and have a look at the state merge code 
again and see if there's something obvious i am missing.

cheers,
dlg

 -- 
 Best regards, 
 
 Loïc BLOT, Engineering
 UNIX Systems, Security and Networks
 http://www.unix-experience.fr
 
 
 Le mercredi 03 juillet 2013 à 08:12 +0200, Loïc BLOT a écrit :
 Hi,
 Thanks for your reply. I wasn't careful about this section.
 If i understand i must add defer option to my WAN iface (or i'm wrong i
 must add it to my vlan995 iface ?) ?
 
 I will test it this morning, and i return back to misc :)
 --
 Best regards,
 Loc BLOT,
 UNIX systems, security and network expert
 http://www.unix-experience.fr
 
 
 Le mercredi 03 juillet 2013  02:02 +0200, mxb a crit :
 pfsync(4) explains this:
 
  The pfsync interface will attempt to collapse multiple state updates
 into
 a single packet where possible.  The maximum number of times a single
 state can be updated before a pfsync packet will be sent out is
 controlled by the maxupd parameter
 
 
 
 and
 
  Where more than one firewall might actively handle packets, e.g. with
 certain ospfd(8), bgpd(8) or carp(4) configurations, it is beneficial
 to
 defer transmission of the initial packet of a connection.  The pfsync
 state insert message is sent immediately; the packet is queued until
 either this message is acknowledged by another system, or a timeout
 has
 expired.  This behaviour is enabled with the defer parameter to
 ifconfig(8).
 
 
 
 Eg. defer: on, yours is off.
 
 //mxb
 
 
 On 2 jul 2013, at 21:54, Loc BLOT loic.b...@unix-experience.fr wrote:
 
 Hi all
 I have a strange issue (or i haven't read pfsync correctly but i don't
 think this is the problem :D)
 
 I'm using 2 OpenBSD as BGP+OSPF routers at the border of one site.
 
 Those BGP routers are secure with strong PF in stateful mode, and the
 stateful is working very well on each router. Because of my full mesh
 BGP configuration, the outgoing layer 7 sessions can leave my network by
 one router and responses can income by the other.
 
 To resolve this issue, i have created a dedidated VLAN for the pfsync
 traffic and attached pfsync to this VLAN.
 
 Here is a sample output of ifconfig on my first router:
 
 vlan995: flags=8843UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST mtu 1500
   lladdr a0:36:9f:10:4a:a6
   priority: 0
   vlan: 995 parent interface: trunk1
   groups: vlan
   status: active
   inet6 fe80::a236:9fff:fe10:4aa6%vlan995 prefixlen 64 scopeid
 0x10
   inet 10.117.1.129 netmask 0xfff8 broadcast 10.117.1.135
 pfsync0: flags=41UP,RUNNING mtu 1500
   priority: 0
   pfsync: syncdev: vlan995 maxupd: 255 defer: off
   groups: carp pfsync
 
 And here on my second router:
 
 vlan995: flags=8843UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST mtu 1500
   lladdr a0:36:9f:17:e2:1e
   priority: 0
   vlan: 995 parent interface: trunk1
   groups: vlan
   status: active
   inet6 fe80::a236:9fff:fe17:e21e%vlan995 prefixlen 64 scopeid
 0x10
   inet 10.117.1.130 netmask 0xfff8 broadcast 10.117.1.135
 pfsync0: flags=41UP,RUNNING mtu 1500
   priority: 0
   pfsync: syncdev: vlan995 maxupd: 255 defer: off
   groups: carp pfsync
 
 As you see in next tcpdump capture, there is some discussions between
 the two routers:
 
 # tcpdump -nni vlan995
 tcpdump: listening on vlan995, link-type EN10MB
 tcpdump: WARNING: compensating for unaligned libpcap packets
 23:41:13.699617 10.117.1.130: PFSYNCv6 len 108
   act UPD ST COMP count 1
   ...
 (DF) [tos 0x10]
 23:41:14.158500 10.117.1.129: PFSYNCv6 len 108
   act UPD ST COMP count 1
   ...
 (DF) [tos 0x10]
 23:41:14.941396 SSTP STP config root=83e3.0:a:b8:7b:27:80 rootcost=3
 bridge=c3e3.0:17:e:2e:f:80 port=142 ifcost=130 age=1/0 

Re: PF sync doesn't not work very well

2013-07-04 Thread David Gwynne
you could try using sloppy states like henning suggested. you'll still get to 
write stateful rules and get the tcp state machine checks but not the tcp 
window checks.

if it works with sloppy states it narrows the issue down to the pfsync state 
merge code. at the moment im kind of guessing thats where the problem is.

dlg

On 05/07/2013, at 5:30 AM, Loïc BLOT loic.b...@unix-experience.fr wrote:

 Hello all,
 thanks for this interesting debate about pf syncing.
 To remember my initial question:
 
 pfsync seems to sync states but not correctly on my BGP+OSPF routers.
 Because each BGP router is master/standby to 2 neighbors (full meshed
 bgp) packets which are outgoing by one router can income by the other
 router, then if i want to use pf as a stateful firewall i must use
 pfsync to sync created states from router A to router B.
 
 If you tell me it's not possible, then i will use pf as a stateless
 firewall.
 
 --
 Best regards,
 Loïc BLOT,
 UNIX systems, security and network expert
 http://www.unix-experience.fr
 
 
 Le jeudi 04 juillet 2013 à 13:17 -0500, Mark Felder a écrit :
 My apologies for just being noise; I missed his first full post with
 much more detail. I was picturing him trying to run redundant servers
 without CARP and running into issues of states disappearing.
 
 [demime 1.01d removed an attachment of type application/pgp-signature which 
 had a name of signature.asc]



Re: tftpd loop

2013-07-01 Thread David Gwynne
what does host -- -c say?

On 29/06/2013, at 3:54 AM, Peter Fraser p...@thinkage.ca wrote:

 tftpd -l -c xxx
 
 resulting from a mistyping causes tftpd to into a loop



Re: BCM5720, LACP and CARP serious problem

2013-05-05 Thread David Gwynne
do you have a real serial console hooked up to the machines? more specifically, 
can you break into ddb when the machine breaks and get a trace?

i use carp on vlans on lacp trunks on top of myx(4) and em(4) quite a lot 
without trouble, so its likely to be bge(4) if you ask me. unfortunately that 
means its my fault or responsibility.

if you could get a trace to verify, that would be much appreciated.

cheers,
dlg

On 05/05/2013, at 4:11 AM, Loïc Blot loic.b...@unix-experience.fr wrote:

 Hello misc.
 On thursay i have upgraded one of our BGP border routers to OpenBSD 5.3,
 and i was pleased to get the BCM5720 working. I have added it to
 existing LACP trunk for LAN (2 LACP, 2 ports on WAN 4 on LAN now).
 There is no problem on this router.
 
 Today, i want to upgrade exactly same model (Dell R320 with PCI Intel
 CARD and BCM5720 on motherbroad plus PCI BCM5720), and i have some very
 problematic issues. OpenBSD upgrade works like a charm, but when i use
 LACP with broadcom cards, after a moment, system totally freeze and
 nothing responds (on ssh connect but also on the server screen and
 keyboard).
 On this router ports must be agregated by 3 (3 for LAN 3 for DMZ), then
 each trunk have 1 intel port and 2 broadcom ports.
 I have tried two configuration, same BCM5720 card in the trunk and 1
 port from each card. Same problem appears.
 To finish, i have disabled all ports except working Intel card, but the
 problem also occurs The only solution i have found to get server
 working is to up bge1 and bge2 and down other interfaces (on the CISCO
 2960G switch, ios 12.2(55)SE3), it's the only case when server doesn't
 freeze.
 When i do each try and i think it was a success i waited 5 min and
 problem occurs, or problem occurs when i reboot the machine.
 
 Other detail, the working router with BCM 5720 is between an Alcatel
 6850 and a CISCO 4507 (Supervisor IV, ios 12.2(54)SG)
 
 OpenBSD mustn't freeze totally, i think something is missing on BCM
 driver or on LACP handling or maybe BCM + LACP + CARP isn't a good idea
 but i haven't any choice :s
 
 Thanks for advance.
 -- 
 Best regards, 
 
 Loïc BLOT, Engineering
 UNIX Systems, Security and Networks
 http://www.unix-experience.fr



Re: hint on starting tftpd -r

2013-02-26 Thread David Gwynne
On 24/02/2013, at 11:38 AM, russell russ...@dotplan.dyndns.org wrote:

 On 02/24/2013 11:32 PM, David Gwynne wrote:
 what are you using the rewrite stuff for?
 
 netbooting.

me too!

we unconditionally netboot all our labs (and most of our staff machines). by 
default we want netboot to fall out back to booting off the disk, but if we 
want to push a new image we click stuff on a webpage and the netboot config 
gets pointed at alternate config that reimages the machine.

it is kind of fun to wol a lab of 60 machines and have them pxeboot 
simultaneously. ive seen tftpd push over 50MB/s in situations like that.

dlg

 
 pxeboot is unable to pick a kernel based on machine.
 and as I run an oddball mix of current/stable
 i386/amd64 (and sparc64 but it does not count as ofwboot.net does specify 
 kernel)
 
 so I use tftpd rewrite rules to load the correct kernel.
 
 I use my constantly growing collection of old machines sort of in the manner 
 you would use a vm.
 copy tree, send wol, have new server.
 
 In all honesty it is sort of stupid, but I am having fun setting it up.
 
 And just for grins and giggles this is what I am using to rewrite
 I am sure my inexperience shows but it is good to learn somthing new
 
 #!/usr/local/bin/python
 #rewrite tftp requests
 import socket, os
 tftpd_rewrite_address = '/var/run/tftpd.sock'
 tftpd_rewrite_address = '/tmp/tftpd.sock'
 tftpd_base = '/tftpboot'
 if os.path.exists(tftpd_rewrite_address):
os.unlink(tftpd_rewrite_address)
 listen_socket = socket.socket(socket.AF_UNIX, socket.SOCK_STREAM)
 listen_socket.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
 listen_socket.bind(tftpd_rewrite_address)
 listen_socket.listen(1)
 tftpd_socket, addr = listen_socket.accept()
 REQUEST_ADDR = 0
 REQUEST_CMD = 1
 REQUEST_FILE = 2
 cmd_list = ['quit']
 cmd = ''
 while cmd != 'quit':
  tftp_request = tftpd_socket.recv(1024)
  for request in tftp_request.strip().split('\n'):
if request in cmd_list:
  cmd = request
else:
  request_data = request.split(' ', 3)
if len(request_data) == 3:
  response = request_data[REQUEST_FILE] + '\n'
  host_name = socket.gethostbyaddr(request_data[REQUEST_ADDR])
  short_name = host_name[0].split('.')[0]
  if os.path.isdir(os.path.join(tftpd_base, short_name)):
if os.path.isabs(response):
  response = response[1:] #remove leading /
  short_name = '/' + short_name
  response = os.path.join(short_name, response)
  send_size = tftpd_socket.send(response)
 
 tftpd_socket.close()
 listen_socket.close()



Re: hint on starting tftpd -r

2013-02-24 Thread David Gwynne
what are you using the rewrite stuff for?

On 23/02/2013, at 5:45 PM, russell russ...@dotplan.dyndns.org wrote:

 So I am using tftpd -r socket and my rewrite script works however I am at a 
 loss as to the best way to start tftpd.
 
 From my experiments, the rewrite engine has to start before tftpd, tftpd 
 expects the socket to exist. however tftpd is started rather earlier in 
 /etc/rc than than a pkg_scripts rc.d entry(my initial choice).
 
 So my options as I see them are.
 
 1 modify /etc/rd.d/tftpd to start the rewrite engine
 benifit:the two programs really do need to run together
 problems: will get erased during upgrade.
 
 2 modify /etc/rc to start tftpd_rewrite_engine before tftps
 problems: nonstandard rc, changes will get erased during upgrade
 
 3 remove tftpd from rc.conf.local and make custom rc.d/tftpd_local
  that will start both processes from pkg_scripts
 problems: nonstandard tftpd start
 
 
 I am hoping I have missed somthing obvious but will probably go with choice 
 three(the pkg_scripts tftpd)
 
 And, if anyone whishes to see it, I would be happy to share my rewrite 
 script, however about the best that can be said about it is It works. It is 
 written in python and I have little experiance writing socket code.



Re: OpenBSD changes virtual nic driver in vmware workstation?

2012-08-14 Thread David Gwynne
On 13/08/2012, at 5:42 PM, C. L. Martinez wrote:

 Hi all,

 I am trying to do some tests with OpenBSD 5.1 and FreeBSD 9.1 beta in
 my laptop virtual lab based on vmware workstation 8. But I have found
 a problem when I try to configure OpenBSD vms: I can't use e1000
 driver with these OpenBSD vms. I have tried to setup these OpenBSD vms
 as FreeBSD virtual guests, as Other, as RHEL, etc ... (and yes, I have
 changed .vmx config file to ethernetX.virtualDriver = e1000 every
 time) but when OpenBSD boots, every time change virtual nic driver to
 vicX  (in .vmx config appears as a vlance, the worst driver
 possible)...

THE WORST DRIVER POSSIBLE!

that would be true if pcn(4) attached, but vic(4) is an implementation of
vmxnet2, which is likely the least worst option available to you in a vmware
guest. it should even be less worse than em(4).

dlg


 After doing several tests, like installing FreeBSD to see if same
 problem occurs, I conclude that the problem may be with OpenBSD ifself
 making the change, is it right??

 Curiously, I have five OpenBSD vms under two ESXi servers, and this
 problem doesn't appears: I can use e1000 configuring OpenBSD vms as
 FreeBSD guest or Other ...

 Any idea??



Re: myricom not listed in supported hardware list

2012-05-29 Thread David Gwynne
yes, they work well.

dlg

On 29/05/2012, at 11:38 PM, Pierre Berthier wrote:

 Hi
 
 it seems to me the Myricom 10GB Ethernet devices should be supported by
 OpenBSD, according to myx(4) and the What's new page of 5.0
 http://www.openbsd.org/50.html#new and actually also 4.2
 http://www.openbsd.org/plus42.html
 
 However there are no mention of those cards in the Supported hardware
 list: http://www.openbsd.org/i386.html#hardware
 
 Can anyone confirm that the Myricom cards are supported before I buy
 one?
 
 Thanks!
 
 -- 
 Pierre



Re: 10G router without polling ?

2011-12-22 Thread David Gwynne
On 22/12/2011, at 6:20 PM, PP;QQ P(P8P?P8QP8P= wrote:

 am I right that OpenBSD does NOT use device polling like FreeBSD or
 Linux (called NAPI) do ?

yes.

 any router (even at 10G rate) will perfectly work without polling ?

my understanding is that polling is to limit/cap the amount of work the system
will do so network traffic cannot livelock the system. openbsd has other
mechanisms to mitigate against network livelock.

 specially, I have a router (100-200Mb rate now) on Broadcom BCM5721
 which is bge and Intel PRO/1000 QP (82571EB) which is em.
 those cards will perfectly work on any speed without any special tunung ?

your cpu matters more than your network card.

dlg



Re: OpenBSD and shebang line to a script not supported?

2011-10-31 Thread David Gwynne
linux runs infinite loops in 5 minutes, so thats not a huge problem for them.

On 01/11/2011, at 2:05 PM, Andres Perera wrote:

 how does linux handle that without going into infinite loops?
 
 On Mon, Oct 31, 2011 at 6:55 PM, Mikolaj Kucharski
 miko...@kucharski.name wrote:
 Hi,
 
 Attached archive has small testing scripts to be extracted in /tmp.
 There are 2 tests (exec1 and exec2) with 2 scripts each (4 scripts
 total):
 
 test#1, openbsd:
 $ /tmp/exec1.sh
 exec1.sh executed
 
 test#1, linux:
 # /tmp/exec1.sh
 /tmp/exec1.pl executed
 exec1.sh executed
 
 
 test#2, openbsd:
 $ /tmp/exec2.pl
 /tmp/exec2.pl[3]: use: not found
 /tmp/exec2.pl[4]: use: not found
 /tmp/exec2.pl[6]: syntax error: `(' unexpected
 
 test#2, linux:
 # /tmp/exec2.pl
 exec2.sh executed
 exec2.sh executed
 exec2.sh executed
 ^C
 
 
 What I see is that OpenBSD doesn't support scripts in shebang line and
 executes /bin/sh instead. Am I correct here?
 
 
 PS. Please CC me in replies. Thanks.
 
 --
 best regards
 q#
 
 [demime 1.01d removed an attachment of type application/x-tar-gz]



Re: pfsync0 MTU

2011-10-22 Thread David Gwynne
mike,

might have to tweak hardmtu in attach too. maybe.

dlg

On 23/10/2011, at 6:18 AM, Mike Belopuhov wrote:

 On Sat, Oct 22, 2011 at 20:14 +0200, Maxim Bourmistrov wrote:

 On both sides I use em(4) with MTU 9000.
 Then tried to set the same value to the pfsync with success (ifconfig
pfsync0 mtu 9000), but the actual value I see is 2048.


 ugh. i thought you've fixed up the source code.
 i'm curious if it'll still work with a smaller mtu on the physical
 interface :-)

 Index: net/if_pfsync.c
 ===
 RCS file: /cvs/src/sys/net/if_pfsync.c,v
 retrieving revision 1.169
 diff -u -p -r1.169 if_pfsync.c
 --- net/if_pfsync.c   20 Oct 2011 08:57:26 -  1.169
 +++ net/if_pfsync.c   22 Oct 2011 18:17:44 -
 @@ -1294,8 +1294,8 @@ pfsyncioctl(struct ifnet *ifp, u_long cm
   s = splnet();
   if (ifr-ifr_mtu = PFSYNC_MINPKT)
   return (EINVAL);
 - if (ifr-ifr_mtu  MCLBYTES) /* XXX could be bigger */
 - ifr-ifr_mtu = MCLBYTES;
 + if (ifr-ifr_mtu  65536)
 + ifr-ifr_mtu = 65536;
   if (ifr-ifr_mtu  ifp-if_mtu)
   pfsync_sendout();
   ifp-if_mtu = ifr-ifr_mtu;



Re: various documentation for Silicon Image chipsets

2011-07-24 Thread David Gwynne
i believe a lot of these docs were opened up due to jeff garzik talking to
silicon image as part of his work on libata in linux.

credit where credit is due...

dlg

On 23/07/2011, at 10:49 PM, Sevan / Venture37 wrote:

 Hi,
 Someone posted a series of links to the freebsd-hardware mailing list
 for docs of various silicon image chipsets, I thought this might be of
 interest to some of you.
 http://lists.freebsd.org/pipermail/freebsd-hardware/2011-July/006754.html



 Sevan / Venture37



Re: 4.7 ospfd FIB/RIB synchronization

2011-07-24 Thread David Gwynne
On 24/07/2011, at 8:27 PM, Jonathan Lassoff wrote:

 On Wed, Apr 20, 2011 at 7:10 AM, David Gwynne l...@animata.net wrote:

 On 20/04/2011, at 11:08 PM, Jonathan Lassoff wrote:

 On Wed, Apr 20, 2011 at 4:22 AM, David Gwynne l...@animata.net wrote:
 you might be able to upgrade your passive firewall to 4.9 next to the
active 4.7 one. it looks like the protocol stayed the same so they should be
able to talk to each other.

 This would seem to be the case.

 This (http://undeadly.org/cgi?action=articlesid=20090301211402) is an
 absolutely excellent bit of writing about the improvements to pfsync,
 BTW. Thanks for letting that be shared.

 however, it looks like bulk updates were broken in 4.7, which would
explain your failover problems. you can work around that by going pfctl -S
/dev/stdout | ssh activefw pfctl -L /dev/stdin as root on the passive fw.

 As an initial seeding of state? It seems to me that only some of my
 flows get affected when failing over (not everything is reset and
 traffic can still flow).

 yes. the pfctl commands will do a bulk update since the in kernel
implementation was unreliable back then.

 It appears that both firewalls have an approximately congruent set of
 states, but usually a pfctl -ss | wc -l can be off by several
 hundred, to several thousand states at times. My hunch is that state
 creation and counter updates are not updated synchronously, so when
 failing over there are still some updates in-flight, and for flows
 that are moving their sequence numbers at a decent clip I could see
 why they might get reset.

 pf has a bit of fuzz when it does its tcp window matching, so packets can
get ahead of the firewall and be ok.

 Do you know if there is a way to see how much this fuzz is or if
 there's an offset?

from memory its 1000 bytes.

 If dropped for being out of a window, will (or can) it get logged to pflog?

again, from memory its just dropped.

 i wrote defer, so yes...

 on my boxes the increase in latency is about .2 to .3ms. if a firewall is
missing its peer(s) it will go up to about 1/100th of a second.

 So does defer wait for a peer to acknowledge a new state just at the
 time of creation, or does it include state updates about sequence
 numbers as well?

defer only delays the first packet.

 I suspect I'm hitting a similar issue as you were with long-lived
 flows getting reset at failover.

i think my problem is that i run both firewalls with the carp demotion counter
set low. when a box is rebooted the carp default is at 0 or 1, which means it
takes over traffic before it gets all the states. later code in rc.local
demotes it, but by that time some packets have been eaten by the new box. i
should fix it, but im lazy.

 thats exactly how i have my stuff configured.

 Have you ever had trouble when re-numbering an interface? It seems to
 me like ospfd doesn't pick up changes in interface numbering if
 changed out from under it. Most other OSPF daemons I use would pick
 this up as it changes, but as far I as can tell there's no way to tell
 ospfd to reload interface addressing.

interfaces and addresses moving around hurts me too.

 I'm often needing to add more and more interfaces and ospf interfaces,
 necessitating failing over so as to make it safe to kill and re-start
 ospfd -- in the process it just seems to nip some flows from flowing.

i do that too. lets annoy claudio together!



Re: splassert: assertwaitok: want -1 have 1 (bnx)

2011-06-29 Thread David Gwynne
On 30/06/2011, at 6:56 AM, Ted Unangst wrote:

 On Wed, 29 Jun 2011, Tom Murphy wrote:

 /bsd: bnx0: Watchdog timeout occurred, resetting!
 /bsd: splassert: assertwaitok: want -1 have 1
 /bsd: Starting stack trace...
 /bsd: assertwaitok() at assertwaitok+0x1c
 /bsd: pool_get() at pool_get+0x95
 /bsd: bnx_alloc_pkts() at bnx_alloc_pkts+0x31
 /bsd: bnx_init_tx_chain() at bnx_init_tx_chain+0x13
 /bsd: bnx_init() at bnx_init+0x18c
 /bsd: bnx_watchdog() at bnx_watchdog+0x4d

 This driver is filled with bad juju.  This changes all the waitoks to not
 ok, so they are interrupt safe.  It already appears to handle the failure
 case.  The rwlock is also totally unsafe and unnecessary.

the issue is that bnx_init is called from softclock when it looks like bnx
doesnt get any interrupts (so it doesnt do tx completions). i assumed bnx_init
was only called from the ioctl paths which have process context.

this diff is also unsafe because you still init the pool with the nointr
allocator, but you're trying to fix the code so bnx_alloc_pkts via bnx_init is
ok to call from interrupt context.

a simpler fix would be to have bnx_watchdog use the system workq to call
bnx_init to reset the chip.

it would also be worthwhile figuring out why this box is calling the watchdog
code on the chip.

dlg


 Index: if_bnx.c
 ===
 RCS file: /home/tedu/cvs/src/sys/dev/pci/if_bnx.c,v
 retrieving revision 1.95
 diff -u -r1.95 if_bnx.c
 --- if_bnx.c  22 Jun 2011 16:44:27 -  1.95
 +++ if_bnx.c  29 Jun 2011 20:56:09 -
 @@ -392,8 +392,8 @@
 void  bnx_stats_update(struct bnx_softc *);
 void  bnx_tick(void *);

 -struct rwlock bnx_tx_pool_lk = RWLOCK_INITIALIZER(bnxplinit);
 -struct pool *bnx_tx_pool = NULL;
 +int bnx_pool_inited;
 +struct pool bnx_tx_pool;
 void  bnx_alloc_pkts(void *, void *);


/
/
 @@ -759,6 +759,12 @@
   goto bnx_attach_fail;
   }

 + if (!bnx_pool_inited) {
 + pool_init(bnx_tx_pool, sizeof(struct bnx_pkt),
 + 0, 0, 0, bnxpkts, pool_allocator_nointr);
 + bnx_pool_inited = 1;
 + }
 +
   mountroothook_establish(bnx_attachhook, sc);
   return;

 @@ -3727,13 +3733,13 @@
   int s;

   for (i = 0; i  4; i++) { /* magic! */
 - pkt = pool_get(bnx_tx_pool, PR_WAITOK);
 + pkt = pool_get(bnx_tx_pool, PR_NOWAIT);
   if (pkt == NULL)
   break;

   if (bus_dmamap_create(sc-bnx_dmatag,
   MCLBYTES * BNX_MAX_SEGMENTS, USABLE_TX_BD,
 - MCLBYTES, 0, BUS_DMA_WAITOK | BUS_DMA_ALLOCNOW,
 + MCLBYTES, 0, BUS_DMA_NOWAIT | BUS_DMA_ALLOCNOW,
   pkt-pkt_dmamap) != 0)
   goto put;

 @@ -3760,7 +3766,7 @@
 stopping:
   bus_dmamap_destroy(sc-bnx_dmatag, pkt-pkt_dmamap);
 put:
 - pool_put(bnx_tx_pool, pkt);
 + pool_put(bnx_tx_pool, pkt);
 }


/
/
 @@ -3906,7 +3912,7 @@
   mtx_leave(sc-tx_pkt_mtx);

   bus_dmamap_destroy(sc-bnx_dmatag, pkt-pkt_dmamap);
 - pool_put(bnx_tx_pool, pkt);
 + pool_put(bnx_tx_pool, pkt);

   mtx_enter(sc-tx_pkt_mtx);
   }
 @@ -4678,25 +4684,9 @@
   struct bnx_softc*sc = (struct bnx_softc *)xsc;
   struct ifnet*ifp = sc-arpcom.ac_if;
   u_int32_t   ether_mtu;
 - int txpl = 1;
   int s;

   DBPRINT(sc, BNX_VERBOSE_RESET, Entering %s()\n, __FUNCTION__);
 -
 - if (rw_enter(bnx_tx_pool_lk, RW_WRITE | RW_INTR) != 0)
 - return;
 - if (bnx_tx_pool == NULL) {
 - bnx_tx_pool = malloc(sizeof(*bnx_tx_pool), M_DEVBUF, M_WAITOK);
 - if (bnx_tx_pool != NULL) {
 - pool_init(bnx_tx_pool, sizeof(struct bnx_pkt),
 - 0, 0, 0, bnxpkts, pool_allocator_nointr);
 - } else
 - txpl = 0;
 - }
 - rw_exit(bnx_tx_pool_lk);
 -
 - if (!txpl)
 - return;

   s = splnet();



Re: openbsd hard disk information

2011-06-27 Thread David Gwynne
On 27/06/2011, at 9:31 PM, Friedrich Locke wrote:

 Dear list member,

 i have installed OpenBSD on my desktop; every thing is ok, expect for
 disk information report.
 It is showed as wd0. I am confused because as far as i know it is a sata
device.

 Why does it (OpenBSD) see it as an old wd.

why not? all you should care about is that you can talk to the blocks on your
disk, how you get to them is hopefully not important.

the reason is sata disks still respond to ata commands, and a lot of sata
controllers appear as largely compatible to a traditional ata controller
supported by the legacy ata stack. just luck i guess :)

dlg



Re: Routing Issue

2011-05-17 Thread David Gwynne
hey david,

pf is run twice on packets going through a box, once before the network stack
and again as it leaves it. this means you have to allow a packet in one side
as well as when it goes out the other.

dlg

On 17/05/2011, at 10:16 PM, David Schulz wrote:

 Hi all,

 i have a LAN within a LAN and the setup is as follows:

 192.168.1.0/24 -- OpenBSD 4.9 Router with 2 NICS -- 10.1.0.0/21

 My goal is to get both Sides talking to each other (lets start with making
 them be able to ping each other). I got it working by using the following
 pf.conf, however i thought i should not need to have those match out
 statements, because OpenBSD routes packets between interfaces by default as
 long sysctl net.inet.ip.forwarding=1 is set.

 From inside my OpenBSD Box i can ping Devices on either Side just fine. From
a
 machine sitting on either Side, i can ping the OpenBSD Box just fine. But i
 simply cannot get Side A Machines to talk to Side B Machines unless i
 uncomment the two below match out statements inside my pf.conf.

 If someone could share some insight, id be most thankful.

 regards,
 D

 Here my simplified pf.conf which again does not work unless i uncomment the
 two match out Rules:
  pf.conf
 int_if=sis0
 ext_if=sis1

 icmp_types = { echoreq, unreach }

 set require-order yes
 set block-policy return
 set optimization normal
 set loginterface $ext_if

 match in all scrub (no-df)

 set skip on lo

 #match out on $int_if from 192.168.1.0/24 to any nat-to ($int_if)
 #match out on $ext_if from 10.1.0.0/21 to any nat-to ($ext_if)

 block log all

 #Simplified for 'making it work purposes'
 pass out quick
 pass in quick

 antispoof quick for { lo0 $int_if $ext_if } inet

 # allow ICMP
 pass in quick on { $int_if $ext_if } inet proto icmp all icmp-type
$icmp_types
 keep state
 

  route -n
 cndlne001'root(~) route -n show | grep default
 default10.1.3.1   UGS023106 - 8
sis0

 cndlne001'root(~) route -n show | grep 192.168.1
 192.168.1/24   link#2 UC 20 - 4
sis1



Re: impact of unaligned partitions/slices on 4kB sector drives (wd10ears)

2011-05-14 Thread David Gwynne
On 14/05/2011, at 6:43 PM, Abel Abraham Camarillo Ojeda wrote:

 I'm starting to get angry about the _horrible_ performance on this drive
 (WD10EARS-00Y), some developer ever got a chance to see something about
 this?

don't get angry, it's just a disk.

we changed the default alignment of partitions on all disks to mitigate this
problem. the only issue you may have with a default install on one of these
drives is a small fragment size on the ffs partitions.

i have had a look at querying disks for their physical and logical block
alignments and offsets, but the the WD??EARS-00? drives dont report this info.
according to western digital, the next generation of these drives
(WD??EARS-11? iirc) are supposed to report them. if i ever find a disk that
does report the physical to logical alignment, i might have a look at having
the system make use of those values.

huggz,
dlg


 The original message is at:

 http://marc.info/?l=openbsd-techm=126281899324219w=2

 (I wasn't subscribed to this list back then)

 Thanks.

 OpenBSD 4.9-current (kobj) #0: Sun May  1 14:32:33 CDT 2011
root@maetel.00z:/usr/kobj
 real mem = 1608056832 (1533MB)
 avail mem = 1551196160 (1479MB)
 mainbus0 at root
 bios0 at mainbus0: SMBIOS rev. 2.6 @ 0xfd400 (50 entries)
 bios0: vendor American Megatrends Inc. version V1.3 date 11/15/2010
 bios0: MSI MS-7623
 acpi0 at bios0: rev 0
 acpi0: sleep states S0 S3 S4 S5
 acpi0: tables DSDT FACP APIC MCFG OEMB SRAT HPET SSDT
 acpi0: wakeup devices PCE2(S4) PCE3(S4) PCE4(S4) PCE5(S4) PCE6(S4)
 PCE7(S4) PCE9(S4) PCEA(S4) PCEB(S4) PCEC(S4) SBAZ(S4) PSKE(S4)
 PSMS(S4) ECIR(S4) PS2K(S3) PS2M(S3) P0PC(S4) UHC1(S4) UHC2(S4)
 UHC3(S4) USB4(S4) UHC5(S4) UHC6(S4) UHC7(S4) PWRB(S3)
 acpitimer0 at acpi0: 3579545 Hz, 32 bits
 acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
 cpu0 at mainbus0: apid 0 (boot processor)
 cpu0: AMD Phenom(tm) II X4 955 Processor, 3200.77 MHz
 cpu0:
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUS
H,MMX,FXSR,SSE,SSE2,HTT,SSE3,MWAIT,CX16,POPCNT,NXE,MMXX,FFXSR,LONG,3DNOW2,3DN
OW
 cpu0: 64KB 64b/line 2-way I-cache, 64KB 64b/line 2-way D-cache, 512KB
 64b/line 16-way L2 cache
 cpu0: ITLB 32 4KB entries fully associative, 16 4MB entries fully
associative
 cpu0: DTLB 48 4KB entries fully associative, 48 4MB entries fully
associative
 cpu0: apic clock running at 200MHz
 cpu1 at mainbus0: apid 1 (application processor)
 cpu1: AMD Phenom(tm) II X4 955 Processor, 3200.16 MHz
 cpu1:
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUS
H,MMX,FXSR,SSE,SSE2,HTT,SSE3,MWAIT,CX16,POPCNT,NXE,MMXX,FFXSR,LONG,3DNOW2,3DN
OW
 cpu1: 64KB 64b/line 2-way I-cache, 64KB 64b/line 2-way D-cache, 512KB
 64b/line 16-way L2 cache
 cpu1: ITLB 32 4KB entries fully associative, 16 4MB entries fully
associative
 cpu1: DTLB 48 4KB entries fully associative, 48 4MB entries fully
associative
 cpu2 at mainbus0: apid 2 (application processor)
 cpu2: AMD Phenom(tm) II X4 955 Processor, 3200.15 MHz
 cpu2:
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUS
H,MMX,FXSR,SSE,SSE2,HTT,SSE3,MWAIT,CX16,POPCNT,NXE,MMXX,FFXSR,LONG,3DNOW2,3DN
OW
 cpu2: 64KB 64b/line 2-way I-cache, 64KB 64b/line 2-way D-cache, 512KB
 64b/line 16-way L2 cache
 cpu2: ITLB 32 4KB entries fully associative, 16 4MB entries fully
associative
 cpu2: DTLB 48 4KB entries fully associative, 48 4MB entries fully
associative
 cpu3 at mainbus0: apid 3 (application processor)
 cpu3: AMD Phenom(tm) II X4 955 Processor, 3200.16 MHz
 cpu3:
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUS
H,MMX,FXSR,SSE,SSE2,HTT,SSE3,MWAIT,CX16,POPCNT,NXE,MMXX,FFXSR,LONG,3DNOW2,3DN
OW
 cpu3: 64KB 64b/line 2-way I-cache, 64KB 64b/line 2-way D-cache, 512KB
 64b/line 16-way L2 cache
 cpu3: ITLB 32 4KB entries fully associative, 16 4MB entries fully
associative
 cpu3: DTLB 48 4KB entries fully associative, 48 4MB entries fully
associative
 ioapic0 at mainbus0: apid 4 pa 0xfec0, version 21, 24 pins
 acpimcfg0 at acpi0 addr 0xe000, bus 0-255
 acpihpet0 at acpi0: 14318180 Hz
 acpiprt0 at acpi0: bus 0 (PCI0)
 acpiprt1 at acpi0: bus 1 (P0P1)
 acpiprt2 at acpi0: bus -1 (PCE2)
 acpiprt3 at acpi0: bus -1 (PCE3)
 acpiprt4 at acpi0: bus -1 (PCE4)
 acpiprt5 at acpi0: bus 2 (PCE5)
 acpiprt6 at acpi0: bus -1 (PCE6)
 acpiprt7 at acpi0: bus -1 (PCE7)
 acpiprt8 at acpi0: bus -1 (PCE9)
 acpiprt9 at acpi0: bus -1 (PCEA)
 acpiprt10 at acpi0: bus -1 (PCEB)
 acpiprt11 at acpi0: bus -1 (PCEC)
 acpiprt12 at acpi0: bus 3 (P0PC)
 acpicpu0 at acpi0: PSS
 acpicpu1 at acpi0: PSS
 acpicpu2 at acpi0: PSS
 acpicpu3 at acpi0: PSS
 acpibtn0 at acpi0: PWRB
 pci0 at mainbus0 bus 0
 pchb0 at pci0 dev 0 function 0 AMD RS780 Host rev 0x00
 ppb0 at pci0 dev 1 function 0 AMD RS780 PCIE rev 0x00
 pci1 at ppb0 bus 1
 vga1 at pci1 dev 5 function 0 vendor ATI, unknown product 0x9616 rev 0x00
 wsdisplay0 at vga1 mux 1: console (80x25, vt100 emulation)
 wsdisplay0: screen 1-5 added (80x25, vt100 emulation)
 ppb1 at 

4.9 firewalls

2011-05-11 Thread David Gwynne
anyone replaced firewalls with 4.9 boxes yet? noticed a difference?



Re: pfsync bulk transfer performance

2011-05-05 Thread David Gwynne
when doing a bulk update pfsync only generates 100 packets a second. each
packet will be filled with as many full state update messages as possible.

unfortunately the full state update message is about 264 bytes so you can only
fit 5 in a packet. that means 5 * 100 or 500 messages a second, which means
6 / 500 seconds, ie, a minimum of 2 minutes.

to make this worse, pfsync wont make a new packet for bulk updates, it will
fill a packet every 100th of a second. if the master has pending updates to
send, you'll fit even less full update messages in a frame. if the master is
reasonably busy you'll always have pending updates.

i do this on my firewalls sometimes:

root@passive ~# ssh master pfctl -S /dev/stdout | pfctl -L /dev/stdin

its a bit faster...

dlg

On 05/05/2011, at 1:23 AM, Kapetanakis Giannis wrote:

 Hi,

 I'd like to ask if it's normal for pfsync bulk transfer to take 5-15
 minutes to end for 60k states.

 pfsync is on a dedicated gigabit interface on both firewalls.

 May  4 17:59:35 fw1 /bsd: carp: pfsync0 demoted group carp by 1 to 131
 (pfsync bulk start)
 May  4 17:59:35 fw1 /bsd: carp: pfsync0 demoted group pfsync by 1 to 1
 (pfsync bulk start)
 May  4 18:13:47 fw1 /bsd: carp: pfsync0 demoted group carp by -1 to 0
 (pfsync bulk done)
 May  4 18:13:47 fw1 /bsd: carp: pfsync0 demoted group pfsync by -1 to 0
 (pfsync bulk done)

 Stats on this interface show 967 pkts/sec1421128 bytes/sec
 Iperf  gives me 850Mbps from fw2 to fw1

 fw1 is -current, fw2 is 4.9 -stable (kudos for another excellent release!)

 regards,

 Giannis

 [demime 1.01d removed an attachment of type application/pkcs7-signature
which had a name of smime.p7s]



Re: pfsync bulk transfer performance

2011-05-05 Thread David Gwynne
On 05/05/2011, at 10:27 PM, Kapetanakis Giannis wrote:

 On 05/05/11 13:37, David Gwynne wrote:
 i do this on my firewalls sometimes:

 root@passive ~# ssh master pfctl -S /dev/stdout | pfctl -L /dev/stdin

 its a bit faster...

 dlg


 I've tried your trick and it took just a second to copy the states.
 However it still took him
 10 minutes to show pfsync bulk done (75k states).

neither firewall knows you copied the states behind pfsyncs back, so the
master will keep sending them, and the backup will wait for the bulk update
complete message.

after the pfctl magic both firewalls will have the same states though, so you
can fail over safely.

dlg



Re: use DUIDs rather than device names in fstab?

2011-04-29 Thread David Gwynne
this is why i like duids:

OpenBSD 4.9-current (GENERIC.MP) #1: Fri Apr 29 14:55:51 EST 2011
d...@hotspare.eait.uq.edu.au:/home/dlg/src/sys/arch/amd64/compile/GENERIC.
MP
real mem = 137428045824 (131061MB)
avail mem = 133755645952 (127559MB)
mainbus0 at root
bios0 at mainbus0: SMBIOS rev. 2.6 @ 0xdf79c000 (103 entries)
bios0: vendor Dell Inc. version 1.3.1 date 10/05/2010
bios0: Dell Inc. PowerEdge R815
acpi0 at bios0: rev 2
acpi0: sleep states S0 S4 S5
acpi0: tables DSDT FACP APIC SPCR HPET MCFG WD__ SLIC ERST HEST BERT EINJ IV__
SRAT SLIT SS__ TCPA
acpi0: wakeup devices PCI0(S5) PCI1(S5)
acpitimer0 at acpi0: 3579545 Hz, 32 bits
acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
cpu0 at mainbus0: apid 0 (boot processor)
cpu0: AMD Opteron(tm) Processor 6128, 2000.28 MHz
cpu0:
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUS
H,MMX,FXSR,SSE,SSE2,HTT,SSE3,MWAIT,CX16,POPCNT,NXE,MMXX,FFXSR,LONG,3DNOW2,3DN
OW
cpu0: 64KB 64b/line 2-way I-cache, 64KB 64b/line 2-way D-cache, 512KB 64b/line
16-way L2 cache
cpu0: ITLB 32 4KB entries fully associative, 16 4MB entries fully associative
cpu0: DTLB 48 4KB entries fully associative, 48 4MB entries fully associative
cpu0: apic clock running at 200MHz
cpu1 at mainbus0: apid 48 (application processor)
cpu1: AMD Opteron(tm) Processor 6128, 2000.04 MHz
cpu1:
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUS
H,MMX,FXSR,SSE,SSE2,HTT,SSE3,MWAIT,CX16,POPCNT,NXE,MMXX,FFXSR,LONG,3DNOW2,3DN
OW
cpu1: 64KB 64b/line 2-way I-cache, 64KB 64b/line 2-way D-cache, 512KB 64b/line
16-way L2 cache
cpu1: ITLB 32 4KB entries fully associative, 16 4MB entries fully associative
cpu1: DTLB 48 4KB entries fully associative, 48 4MB entries fully associative
cpu2 at mainbus0: apid 32 (application processor)
cpu2: AMD Opteron(tm) Processor 6128, 2000.04 MHz
cpu2:
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUS
H,MMX,FXSR,SSE,SSE2,HTT,SSE3,MWAIT,CX16,POPCNT,NXE,MMXX,FFXSR,LONG,3DNOW2,3DN
OW
cpu2: 64KB 64b/line 2-way I-cache, 64KB 64b/line 2-way D-cache, 512KB 64b/line
16-way L2 cache
cpu2: ITLB 32 4KB entries fully associative, 16 4MB entries fully associative
cpu2: DTLB 48 4KB entries fully associative, 48 4MB entries fully associative
cpu3 at mainbus0: apid 16 (application processor)
cpu3: AMD Opteron(tm) Processor 6128, 2000.04 MHz
cpu3:
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUS
H,MMX,FXSR,SSE,SSE2,HTT,SSE3,MWAIT,CX16,POPCNT,NXE,MMXX,FFXSR,LONG,3DNOW2,3DN
OW
cpu3: 64KB 64b/line 2-way I-cache, 64KB 64b/line 2-way D-cache, 512KB 64b/line
16-way L2 cache
cpu3: ITLB 32 4KB entries fully associative, 16 4MB entries fully associative
cpu3: DTLB 48 4KB entries fully associative, 48 4MB entries fully associative
cpu4 at mainbus0: apid 1 (application processor)
cpu4: AMD Opteron(tm) Processor 6128, 2000.04 MHz
cpu4:
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUS
H,MMX,FXSR,SSE,SSE2,HTT,SSE3,MWAIT,CX16,POPCNT,NXE,MMXX,FFXSR,LONG,3DNOW2,3DN
OW
cpu4: 64KB 64b/line 2-way I-cache, 64KB 64b/line 2-way D-cache, 512KB 64b/line
16-way L2 cache
cpu4: ITLB 32 4KB entries fully associative, 16 4MB entries fully associative
cpu4: DTLB 48 4KB entries fully associative, 48 4MB entries fully associative
cpu5 at mainbus0: apid 49 (application processor)
cpu5: AMD Opteron(tm) Processor 6128, 2000.04 MHz
cpu5:
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUS
H,MMX,FXSR,SSE,SSE2,HTT,SSE3,MWAIT,CX16,POPCNT,NXE,MMXX,FFXSR,LONG,3DNOW2,3DN
OW
cpu5: 64KB 64b/line 2-way I-cache, 64KB 64b/line 2-way D-cache, 512KB 64b/line
16-way L2 cache
cpu5: ITLB 32 4KB entries fully associative, 16 4MB entries fully associative
cpu5: DTLB 48 4KB entries fully associative, 48 4MB entries fully associative
cpu6 at mainbus0: apid 33 (application processor)
cpu6: AMD Opteron(tm) Processor 6128, 2000.04 MHz
cpu6:
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUS
H,MMX,FXSR,SSE,SSE2,HTT,SSE3,MWAIT,CX16,POPCNT,NXE,MMXX,FFXSR,LONG,3DNOW2,3DN
OW
cpu6: 64KB 64b/line 2-way I-cache, 64KB 64b/line 2-way D-cache, 512KB 64b/line
16-way L2 cache
cpu6: ITLB 32 4KB entries fully associative, 16 4MB entries fully associative
cpu6: DTLB 48 4KB entries fully associative, 48 4MB entries fully associative
cpu7 at mainbus0: apid 17 (application processor)
cpu7: AMD Opteron(tm) Processor 6128, 2000.04 MHz
cpu7:
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUS
H,MMX,FXSR,SSE,SSE2,HTT,SSE3,MWAIT,CX16,POPCNT,NXE,MMXX,FFXSR,LONG,3DNOW2,3DN
OW
cpu7: 64KB 64b/line 2-way I-cache, 64KB 64b/line 2-way D-cache, 512KB 64b/line
16-way L2 cache
cpu7: ITLB 32 4KB entries fully associative, 16 4MB entries fully associative
cpu7: DTLB 48 4KB entries fully associative, 48 4MB entries fully associative
cpu8 at mainbus0: apid 2 (application processor)
cpu8: AMD Opteron(tm) Processor 6128, 2000.04 MHz
cpu8:

Re: use DUIDs rather than device names in fstab?

2011-04-29 Thread David Gwynne
On 29/04/2011, at 4:48 PM, Otto Moerbeek wrote:


 Op 29 apr. 2011 om 07:00 heeft David Gwynne l...@animata.net het volgende
geschreven:

 this is why i like duids:

 Is this what you get when you max out every option when ordering a machine?

no...



Re: use DUIDs rather than device names in fstab?

2011-04-28 Thread David Gwynne
On 29/04/2011, at 3:33 AM, Nick Holland wrote:

 On 04/28/2011 10:58 AM, Bryan wrote:
 On Wed, Apr 27, 2011 at 19:55, David Gwynnel...@animata.net  wrote:
 amen.

 anything that helps us get away from the kernels arbitrary numbering of
 devices to identify disks is a good thing.

 dlg


 Would there be a reason why you wouldn't use DUIDs?  Do some older
 drives not support it, or some archs not support this?

 There's no issue of drive age or platform.

as nick says, this isnt a disk dependant thing. the duid is stored in the
disklabel, so it works on any block device where the kernel can read a
disklabel. obviously you can have duplicate duids (eg, by dding one disk to
another) which can be a bit confusing, but we can only go so far in protecting
people from themselves. there's lots of worse things you can do with disks and
dd...

anyway, one of the nice things about openbsd is that it tries to be as
consistent as possible between architectures. mounting partitions by duid Just
Works(tm) everywhere now.

dlg



Re: use DUIDs rather than device names in fstab?

2011-04-27 Thread David Gwynne
amen.

anything that helps us get away from the kernels arbitrary numbering of
devices to identify disks is a good thing.

dlg

On 28/04/2011, at 10:20 AM, Nick Holland wrote:

 On 04/27/11 08:27, Kent Watsen wrote:
 Maybe you should tell us what happened and what you were expecting.

 I saw the check-in which stated that it was being turned on to see what
 response there is, which is all I'm doing...

 When installing on a system only having IDE-based drives, I was
 expecting to not be prompted, since I don't believe it's easy for the
 boot drive change location.  But I realize now that some flash card
 adapters present themselves as an IDE device, which makes them as
 portable as my USB pen drive.

 Your imagination doesn't go very far :)

 ever add an IDE or SATA card to a machine that was working Just Fine
 before?  It could be a lot of fun.  Even with DUIDs, it will take a
 bit of adjusting of drives and/or boot ROMs sometimes.

 I used to manage a machine which archived e-mail, it would fill its
 available disk space every few months, and we'd take one (of many) sets
 of disks out and put another in.  While superficially an easy task, the
 shuffling of cables for the three PCI SATA cards to the six drive packs
 was annoying and time consuming to get right...and we did it that way
 because it was a LOT easier than using the two onboard SATA ports plus
 two off-board, since nothing numbered the way we wished.  Would have
 been nice to have had this feature then.

 Nick.



Re: 4.7 ospfd FIB/RIB synchronization

2011-04-20 Thread David Gwynne
you might be able to upgrade your passive firewall to 4.9 next to the active
4.7 one. it looks like the protocol stayed the same so they should be able to
talk to each other.

however, it looks like bulk updates were broken in 4.7, which would explain
your failover problems. you can work around that by going pfctl -S
/dev/stdout | ssh activefw pfctl -L /dev/stdin as root on the passive fw.

as a matter of interest, are you using ospf for failover on one side of your
firewalls?

dlg

On 20/04/2011, at 2:45 PM, Jonathan Lassoff wrote:

 On Tue, Apr 19, 2011 at 7:14 PM, David Gwynne l...@animata.net wrote:
 i had this same problem and fixed it in time for the 4.8 release. is it
possible you can upgrade?

 Do you mean that this was an issue in 4.7 that was fixed in 4.8?

 I most definitely plan to upgrade (all the way to 4.9, most likely),
 but am stuck with 4.7 for now, since there's not a hitless way for me
 to upgrade right now (mostly due to pfsync causing sessions to reset
 when failing over).

 Thanks for the pointer.

 Cheers,
 jof



Re: 4.7 ospfd FIB/RIB synchronization

2011-04-20 Thread David Gwynne
On 20/04/2011, at 11:08 PM, Jonathan Lassoff wrote:

 On Wed, Apr 20, 2011 at 4:22 AM, David Gwynne l...@animata.net wrote:
 you might be able to upgrade your passive firewall to 4.9 next to the
active 4.7 one. it looks like the protocol stayed the same so they should be
able to talk to each other.

 This would seem to be the case.

 This (http://undeadly.org/cgi?action=articlesid=20090301211402) is an
 absolutely excellent bit of writing about the improvements to pfsync,
 BTW. Thanks for letting that be shared.

 however, it looks like bulk updates were broken in 4.7, which would explain
your failover problems. you can work around that by going pfctl -S
/dev/stdout | ssh activefw pfctl -L /dev/stdin as root on the passive fw.

 As an initial seeding of state? It seems to me that only some of my
 flows get affected when failing over (not everything is reset and
 traffic can still flow).

yes. the pfctl commands will do a bulk update since the in kernel
implementation was unreliable back then.

 It appears that both firewalls have an approximately congruent set of
 states, but usually a pfctl -ss | wc -l can be off by several
 hundred, to several thousand states at times. My hunch is that state
 creation and counter updates are not updated synchronously, so when
 failing over there are still some updates in-flight, and for flows
 that are moving their sequence numbers at a decent clip I could see
 why they might get reset.

pf has a bit of fuzz when it does its tcp window matching, so packets can get
ahead of the firewall and be ok. also, pf will drop out of window packets
rather than send RSTs and such. pfsync will also make a good effort to merge
state updates with local changes and will aggressively send updates to its
peers when it thinks traffic has recently gone over both legs of a firewall.

however, if the bulk update didnt work properly then you can have some missing
after failover. if the state doesnt exist then you fall through to the
ruleset, pfsync doesnt ask its peers for missing states. this used to affect
me with very long lived connections that could be idle for a while (eg, nfs).

 Have you ever used pfsync with the defer option set? I can imagine
 that it just takes longer for sessions to start since each firewall
 would have to wait for the insertion of the state on the other
 firewall, but I wonder how much latency that adds in practice.

i wrote defer, so yes...

on my boxes the increase in latency is about .2 to .3ms. if a firewall is
missing its peer(s) it will go up to about 1/100th of a second.

 Another open question would be what to do in the case of multiple
 firewalls receiving the multicast update (not applicable for me, but
 something I'm considering trying). I wonder if there ought to be a
 hook for defer to count the number of related received state insertion
 messages it gets before starting.

the code assumes that if one peer got and acked the update, then all your
peers got the update.

 as a matter of interest, are you using ospf for failover on one side of
your firewalls?

 I'm hooking CARP interfaces up into ospfd to signal to my IGP which
 firewall is active at a given time. ospfd seems to have hooks into
 CARP which will change LSA metrics based on the CARP state.

 For the interfaces that these firewalls are announcing into the IGP,
 CARP is used to direct upstream traffic at the active router.

thats exactly how i have my stuff configured.

dlg



Re: 4.7 ospfd FIB/RIB synchronization

2011-04-19 Thread David Gwynne
i had this same problem and fixed it in time for the 4.8 release. is it
possible you can upgrade?

On 20/04/2011, at 9:10 AM, Jonathan Lassoff wrote:

 I'm having a bit of an issue with OpenOSPFd on 4.7 running on i386
hardware.

 The gist of the problem is that it seems that changes to the kernel
 routing table and/or interfaces are not being synchronized into the
 OSPF RIB and LSDB.

 As an example, I have a CARP interface called carp17 that is
 configured in /etc/ospfd.conf, and routed like so:

 # ifconfig carp17
 carp17: flags=8843UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST mtu 1500
lladdr 00:00:5e:00:01:11
description: Foobarbaz CARP
priority: 0
carp: BACKUP carpdev vlan17 vhid 17 advbase 1 advskew 100
groups: carp
inet6 fe80::200:5eff:fe00:111%carp17 prefixlen 64 scopeid 0x2f
inet X.X.186.161 netmask 0xfff0 broadcast X.X.186.175

 # netstat -rn | grep X.X.186.160 | grep carp
 X.X.186.160/28  link#47C  00 - 4 carp17

 # ospfctl show fib | grep .186.160
 C4  X.X.186.160/28link#47
 *O   32 X.X.186.160/28X.X.191.21

 # ospfctl show rib | grep .186.160
 X.X.186.160/28X.X.191.21 Intra-Area   Network   20  11:50:38

 So, it's configured with the network X.X.186.160/28.
 If I try and re-configure this to be a /29:

 # ifconfig carp17 inet X.X.186.161 netmask 0xfff8

 ospfd's FIB reflects the change:

 # ospfctl show fib | grep 186.160
 *O   32 X.X.186.160/28X.X.191.21
 C4  X.X.186.160/29link#47

 But the RIB does not:

 # ospfctl show rib | grep 186.160
 X.X.186.160/28X.X.191.21 Intra-Area   Network   20  12:09:25

 I've tried an ospfctl fib reload to no avail. The RIB still doesn't
 reflect the change:

 # ospfctl show rib | grep 186.160
 X.X.186.160/28X.X.191.21 Intra-Area   Network   20  12:11:39



 Is there something I could be missing or doing wrong? Should FIB
 synchronization into OSPF work with 4.7?
 I'm going off of the changelog and this mailinglist entry:
 http://marc.info/?l=openbsd-miscm=127616167503271

 Cheers,
 jof



Re: new upper limit with BIGMEM

2011-04-05 Thread David Gwynne
OpenBSD 4.9-current (GENERIC.MP) #36: Mon Apr  4 09:39:35 EST 2011
d...@hotspare.eait.uq.edu.au:/home/dlg/src/sys/arch/amd64/compile/GENERIC.
MP
real mem = 137428045824 (131061MB)
avail mem = 133755703296 (127559MB)

seems to work ok...



Re: network bandwith with em(4)

2011-02-24 Thread David Gwynne
id like to reiterate ryans advice to have a look at the systat mbuf output.

as he said, mclgeti will try to protect the host by restricting the number of
packets placed on the rx rings. it turns out you dont need (or cant use) a lot
of packets on the ring, so bumping the ring size is a useless tweak. mclgeti
simply wont let you fill all those descriptors.

if you were allowed to fill all 2048 entries on your modified rings, that
would just mean you spend more time in the interrupt handler pulling packets
off these rings and freeing them immediately because you have no time to
process them. ie, increasing the ring size would actually slow down your
forwarding rate if mclgeti was disabled.

cheers,
dlg

On 25/02/2011, at 9:41 AM, Ryan McBride wrote:

 On Wed, Feb 23, 2011 at 06:07:16PM +0100, Patrick Lamaiziere wrote:
 I log the congestion counter (each 10s) and there are at max 3 or 4
 congestions per day. I don't think the bottleneck is pf.

 The congestion counter doesn't directly mean you have a bottleneck in
 PF; it's triggered by the IP input queue being full, and could indicate
 a bottleneck in other places as well, which PF tries to help out with by
 dropping packets earlier.


 Interface errors?

 Quite a lot.

 The output of `systat mbufs` is worth looking at, in particular the
 figure for LIVELOCKS, and the LWM/CWM figures for the interface(s) in
 question.

 If the livelocks value is very high, and the LWM/CWM numbers are very
 small, it is likely that the MCLGETI interface is protecting your system
 from being completly flattened by forcing the em card to drop packets
 (supported by your statement that the error rate is high). If it's bad
 enough MCLGETI will be so effective that the pf congestion counter will
 not get increment.


 You mentioned the following in your initial email:

 #define MAX_INTS_PER_SEC8000

 Do you think I can increase this value? The interrupt rate of the
 machine is at max ~60% (top).

 Increasing this value will likely hurt you. 60% interrupt rate sounds
 about right to me for a firewall system that is running at full tilt;
 100% interrupt is very bad, if your system spends all cycles servicing
 interrupts it will not do very much of anything useful.


 dmesg:
 em0 at pci5 dev 0 function 0 Intel PRO/1000 QP (82571EB) rev
 0x06: apic 1 int 13 (irq 14), address 00:15:17:ed:98:9d

 em4 at pci9 dev 0 function 0 Intel PRO/1000 QP (82575GB) rev 0x02:
 apic 1 int 23 (irq 11), address 00:1b:21:38:e0:80

 How about a _full_ dmesg, so someone can take a wild guess at what
 your machine is capable of?

 -Ryan



Re: Dell R310 - H200 Raid performance problem

2011-02-20 Thread David Gwynne
i believe the diff below should work out of the box. it pulls in
all mikeb's fixes.

On Fri, Feb 18, 2011 at 07:54:09PM +0100, ??ukasz Czarniecki wrote:
 With following Mike's suggestions it worked.
 
 
 # scsi -f /dev/rsd0c -m 8
 IC:  0
 ABPF:  0
 CAP:  0
 DISC:  0
 SIZE:  0
 WCE:  1
 MF:  0
 RCD:  0
 Demand Retention Priority:  0
 Write Retention Priority:  0
 Disable Pre-fetch Transfer Length:  65535
 Minimum Pre-fetch:  0
 Maximum Pre-fetch:  65280
 Maximum Pre-fetch Ceiling:  65535
 FSW:  0
 LBCSS:  0
 DRA:  0
 Vendor-specific:  0
 NV_DIS:  0
 Number of Cache Segments:  15
 Cache Segment Size:  0
 
 how to manipulate write cache policy?

the lsi firmwares dont implement handling of the mod page changes
unfortunately. you could call the ioctl this implements yourself
though from userland.

Index: mpii.c
===
RCS file: /cvs/src/sys/dev/pci/mpii.c,v
retrieving revision 1.37
diff -u -p -r1.37 mpii.c
--- mpii.c  29 Dec 2010 03:55:09 -  1.37
+++ mpii.c  20 Feb 2011 09:18:58 -
@@ -29,6 +29,7 @@
 #include sys/kernel.h
 #include sys/rwlock.h
 #include sys/sensors.h
+#include sys/dkio.h
 #include sys/tree.h
 
 #include machine/bus.h
@@ -981,6 +982,51 @@ struct mpii_msg_sas_oper_reply {
u_int32_t   ioc_loginfo;
 } __packed;
 
+struct mpii_msg_raid_action_request {
+   u_int8_taction;
+#define MPII_RAID_ACTION_CHANGE_VOL_WRITE_CACHE(0x17)
+   u_int8_treserved1;
+   u_int8_tchain_offset;
+   u_int8_tfunction;
+
+   u_int16_t   vol_dev_handle;
+   u_int8_tphys_disk_num;
+   u_int8_tmsg_flags;
+
+   u_int8_tvp_id;
+   u_int8_tvf_if;
+   u_int16_t   reserved2;
+
+   u_int32_t   reserved3;
+
+   u_int32_t   action_data;
+#define MPII_RAID_VOL_WRITE_CACHE_MASK (0x03)
+#define MPII_RAID_VOL_WRITE_CACHE_DISABLE  (0x01)
+#define MPII_RAID_VOL_WRITE_CACHE_ENABLE   (0x02)
+
+   struct mpii_sge action_sge;
+} __packed;
+
+struct mpii_msg_raid_action_reply {
+   u_int8_taction;
+   u_int8_treserved1;
+   u_int8_tchain_offset;
+   u_int8_tfunction;
+
+   u_int16_t   vol_dev_handle;
+   u_int8_tphys_disk_num;
+   u_int8_tmsg_flags;
+
+   u_int8_tvp_id;
+   u_int8_tvf_if;
+   u_int16_t   reserved2;
+
+   u_int16_t   reserved3;
+   u_int16_t   ioc_status;
+
+   u_int32_t   action_data[5];
+} __packed;
+
 struct mpii_cfg_hdr {
u_int8_tpage_version;
u_int8_tpage_length;
@@ -1256,6 +1302,11 @@ struct mpii_cfg_raid_vol_pg0 {
 #define MPII_CFG_RAID_VOL_0_STATUS_RESYNC  (116)
 
u_int16_t   volume_settings;
+#define MPII_CFG_RAID_VOL_0_SETTINGS_CACHE_MASK(0x30)
+#define MPII_CFG_RAID_VOL_0_SETTINGS_CACHE_UNCHANGED   (0x00)
+#define MPII_CFG_RAID_VOL_0_SETTINGS_CACHE_DISABLED(0x10)
+#define MPII_CFG_RAID_VOL_0_SETTINGS_CACHE_ENABLED (0x20)
+
u_int8_thot_spare_pool;
u_int8_treserved1;
 
@@ -1972,6 +2023,8 @@ int   mpii_req_cfg_page(struct mpii_softc
 
 intmpii_get_ioc_pg8(struct mpii_softc *);
 
+intmpii_ioctl_cache(struct scsi_link *, u_long, struct dk_cache *);
+
 #if NBIO  0
 intmpii_ioctl(struct device *, u_long, caddr_t);
 intmpii_ioctl_inq(struct mpii_softc *, struct bioc_inq *);
@@ -4650,19 +4703,123 @@ mpii_scsi_cmd_done(struct mpii_ccb *ccb)
 
mpii_push_reply(sc, ccb-ccb_rcb);
scsi_done(xs);
-}
+}
 
 int
 mpii_scsi_ioctl(struct scsi_link *link, u_long cmd, caddr_t addr, int flag)
 {
struct mpii_softc   *sc = (struct mpii_softc *)link-adapter_softc;
+   struct mpii_device  *dev = sc-sc_devs[link-target];
 
DNPRINTF(MPII_D_IOCTL, %s: mpii_scsi_ioctl\n, DEVNAME(sc));
 
-   if (sc-sc_ioctl)
-   return (sc-sc_ioctl(link-adapter_softc, cmd, addr));
-   else
-   return (ENOTTY);
+   switch (cmd) {
+   case DIOCGCACHE:
+   case DIOCSCACHE:
+   if (dev != NULL  ISSET(dev-flags, MPII_DF_VOLUME)) {
+   return (mpii_ioctl_cache(link, cmd,
+   (struct dk_cache *)addr));
+   }
+   break;
+
+   default:
+   if (sc-sc_ioctl)
+   return (sc-sc_ioctl(link-adapter_softc, cmd, addr));
+
+   break;
+   }
+
+   return (ENOTTY);
+}
+
+int
+mpii_ioctl_cache(struct scsi_link *link, u_long cmd, struct dk_cache *dc)
+{
+   struct mpii_softc *sc = (struct mpii_softc *)link-adapter_softc;
+   struct mpii_device *dev = sc-sc_devs[link-target];
+   struct mpii_cfg_raid_vol_pg0 *vpg;
+   struct 

Re: Dell R310 - H200 Raid performance problem

2011-02-17 Thread David Gwynne
this diff implements the disk cache ioctl handling in mpii so sd(4)
can drive the change rather than have mpii(4) whack everything.
modelled on the same functionality in mpi(4) and mikeb's code...

could someone test this please?

Index: mpii.c
===
RCS file: /cvs/src/sys/dev/pci/mpii.c,v
retrieving revision 1.37
diff -u -p -r1.37 mpii.c
--- mpii.c  29 Dec 2010 03:55:09 -  1.37
+++ mpii.c  18 Feb 2011 06:54:58 -
@@ -29,6 +29,7 @@
 #include sys/kernel.h
 #include sys/rwlock.h
 #include sys/sensors.h
+#include sys/dkio.h
 #include sys/tree.h
 
 #include machine/bus.h
@@ -981,6 +982,52 @@ struct mpii_msg_sas_oper_reply {
u_int32_t   ioc_loginfo;
 } __packed;
 
+struct mpii_msg_raid_action_request {
+   u_int8_taction;
+#define MPII_RAID_ACTION_CHANGE_VOL_WRITE_CACHE(0x17)
+   u_int8_treserved1;
+   u_int8_tchain_offset;
+   u_int8_tfunction;
+
+   u_int16_t   vol_dev_handle;
+   u_int8_tphys_disk_num;
+   u_int8_tmsg_flags;
+
+   u_int8_tvp_id;
+   u_int8_tvf_if;
+   u_int16_t   reserved2;
+
+   u_int32_t   reserved3;
+
+   u_int32_t   action_data;
+#define MPII_RAID_VOL_WRITE_CACHE_DISABLE  (0x01)
+#define MPII_RAID_VOL_WRITE_CACHE_ENABLE   (0x02)
+
+   struct mpii_sge action_sge;
+} __packed;
+
+struct mpii_msg_raid_action_reply {
+   u_int8_taction;
+   u_int8_treserved1;
+   u_int8_tchain_offset;
+   u_int8_tfunction;
+
+   u_int16_t   vol_dev_handle;
+   u_int8_tphys_disk_num;
+   u_int8_tmsg_flags;
+
+   u_int8_tvp_id;
+   u_int8_tvf_if;
+   u_int16_t   reserved2;
+
+   u_int16_t   reserved3;
+   u_int16_t   ioc_status;
+
+   u_int32_t   action_data[5];
+
+   struct mpii_sge action_sge;
+} __packed;
+
 struct mpii_cfg_hdr {
u_int8_tpage_version;
u_int8_tpage_length;
@@ -1256,6 +1303,11 @@ struct mpii_cfg_raid_vol_pg0 {
 #define MPII_CFG_RAID_VOL_0_STATUS_RESYNC  (116)
 
u_int16_t   volume_settings;
+#define MPII_CFG_RAID_VOL_0_SETTINGS_CACHE_MASK(0x30)
+#define MPII_CFG_RAID_VOL_0_SETTINGS_CACHE_UNCHANGED   (0x00)
+#define MPII_CFG_RAID_VOL_0_SETTINGS_CACHE_DISABLED(0x10)
+#define MPII_CFG_RAID_VOL_0_SETTINGS_CACHE_ENABLED (0x20)
+
u_int8_thot_spare_pool;
u_int8_treserved1;
 
@@ -1972,6 +2024,8 @@ int   mpii_req_cfg_page(struct mpii_softc
 
 intmpii_get_ioc_pg8(struct mpii_softc *);
 
+intmpii_ioctl_cache(struct scsi_link *, u_long, struct dk_cache *);
+
 #if NBIO  0
 intmpii_ioctl(struct device *, u_long, caddr_t);
 intmpii_ioctl_inq(struct mpii_softc *, struct bioc_inq *);
@@ -4650,19 +4704,113 @@ mpii_scsi_cmd_done(struct mpii_ccb *ccb)
 
mpii_push_reply(sc, ccb-ccb_rcb);
scsi_done(xs);
-}
+}
 
 int
 mpii_scsi_ioctl(struct scsi_link *link, u_long cmd, caddr_t addr, int flag)
 {
struct mpii_softc   *sc = (struct mpii_softc *)link-adapter_softc;
+   struct mpii_device  *dev = sc-sc_devs[link-target];
 
DNPRINTF(MPII_D_IOCTL, %s: mpii_scsi_ioctl\n, DEVNAME(sc));
 
-   if (sc-sc_ioctl)
-   return (sc-sc_ioctl(link-adapter_softc, cmd, addr));
-   else
-   return (ENOTTY);
+   switch (cmd) {
+   case DIOCGCACHE:
+   case DIOCSCACHE:
+   if (dev != NULL  ISSET(dev-flags, MPII_DF_VOLUME)) {
+   return (mpii_ioctl_cache(link, cmd,
+   (struct dk_cache *)addr));
+   }
+   break;
+
+   default:
+   if (sc-sc_ioctl)
+   return (sc-sc_ioctl(link-adapter_softc, cmd, addr));
+
+   break;
+   }
+
+   return (ENOTTY);
+}
+
+int
+mpii_ioctl_cache(struct scsi_link *link, u_long cmd, struct dk_cache *dc)
+{
+   struct mpii_softc *sc = (struct mpii_softc *)link-adapter_softc;
+   struct mpii_device *dev = sc-sc_devs[link-target];
+   struct mpii_cfg_raid_vol_pg0 *vpg;
+   struct mpii_msg_raid_action_request *req;
+   struct mpii_cfg_hdr hdr;
+   struct mpii_ccb *ccb;
+   u_int32_t addr = MPII_CFG_RAID_VOL_ADDR_HANDLE | dev-dev_handle;
+   size_t pagelen;
+   int rv = 0;
+   int enabled;
+
+   if (mpii_req_cfg_header(sc, MPII_CONFIG_REQ_PAGE_TYPE_RAID_VOL, 0,
+   addr, 0, hdr) != 0)
+   return (EINVAL);
+
+   pagelen = hdr.page_length * 4;
+   vpg = malloc(pagelen, M_TEMP, M_WAITOK | M_CANFAIL | M_ZERO);
+   if (vpg == NULL)
+   return (ENOMEM);
+
+   if (mpii_req_cfg_page(sc, addr, 0, hdr, 1, 

Re: pf commands to discuss

2011-01-20 Thread David Gwynne
either:

pass in log (all) on $int_if inet proto udp from $admin_pc to !$int_if \
 port 33433  33626 keep state tag mytracert

pass out log on $ext_if inet proto udp from $ext_if to any \
 port 33433  33626 keep state tagged mytracert

or:

pass in log (all) on $int_if inet proto udp from $admin_pc to !$int_if \
 port 33433  33626 keep state

pass out log on $ext_if inet proto udp from $ext_if to any \
 port 33433  33626 keep state tagged mytracert received-on $int_if

there are some other ways too, but i like these the most.

dlg

On 20/01/2011, at 6:17 PM, Indunil Jayasooriya wrote:

 Hi list,
 
 
 I have an question. I want my pc (i.e admin_pc)  to be able to traceroute
 which is behind a OpenBSD 4.8 pf firewall ( Doing NAT). So , I have added
 below rules in pf.conf file.
 
 
 match out on $ext_if from $lan_net nat-to ($ext_if)
 
 pass in log (all) on $int_if inet proto udp from $admin_pc to !$int_if \
  port 33433  33626 keep state
 
 pass out log on $ext_if inet proto udp from $ext_if to any \
  port 33433  33626 keep state
 
 
 due to the above rules, my PC can traceroute. It works fine. *But*, in
 addition to that, Firewall also can traceroute because of the above *pass
 out* rule. I *do NOT* want firewall to be able to traceroute.
 
 my question is that How can I exclude my firewall from being able to doing
 it ?
 
 
 
 
 
 
 
 -- 
 Thank you
 Indunil Jayasooriya



  1   2   3   >