Re: [patch] cleaner checksum modification for pf

2015-06-22 Thread Richard Procter

On 16/06/2015, at 1:09 PM, Richard Procter wrote:

 - I was unable to test af-to, which does a lot of packet fiddling.

I've now tested this without obvious issue. Note: a couple of 
one-line changes in icmp af-translation remain untested. Details 
attached.

 - I haven't tested modification of unaligned TCP options,
   for SACK and timestamp. 

This tested without issue, too.

Also, some points raised in off-list comment suggested I was unclear in my
patch post, which I'll address here.

- Note this diff does not revert all of Henning's checksum work, just returns
to checksum modification for pf-altered packets as in 5.3[0] but without the 
ugly nested pf_checksum_fixup() calls and without (I expect) loss of 
efficiency. I've aimed for the minimal necessary changes only.

And the patch is made much easier now that pseudo-header checksums are
calculated late in the output path, as pf now never sees a partial checksum.
Worst-case, pf does some unnecessary arithmetic when altering locally generated
packets, but no more than the original 5.3 code did, as the 5.3 modification
code didn't distinguish between local and forwarded packets either.

- The first patch renames pf_change_a() to pf_change_32_unaligned() as it was
not being used for addresses alone, and I needed to move it aside for an
address-specific pf_change_a().

- Some commented that, although the patch re-enables end-to-end verification of
transport payloads traversing pf, it is unimportant as we now have far stronger
end-to-end checks in ssh and TLS.

These protocols, however, assume a reliable transport[1]: they terminate the
connection on a failed MAC as evidence of malicious tampering[*]. If they
didn't they would need to reimplement TCP's retransmission strategies, as 
the Internet can be expected to damage packets as a matter of course. So TCP 
checksums continue to play an important role for secure streams built atop TCP
and this diff bears upon them, too. 


As usual, comment, questions, or criticism is welcome.

best, 
Richard.
 
[0] 5.3, not 5.4 as I stated
[1] for those interested in the fine print: 
rfc4253 (2006) SSH p3
rfc5246 (2008) TLSv1.2 p3
[*] IIRC someone last year reported mysterious ssh disconnects due to corrupted
MACs which Stuart Henderson pointed out could have been due to a flakey NAT
router.

Test details:

- unaligned SACK, timestamp options
- tested on i386, and I expect no issues for architectures with stronger 
  alignment constraints but aiming to test this in the weekend on socppc 
  for kicks. 

- pf_translate_icmp_af
These two remain untested due to my lack of test nodes:
nextmtu for ICMP6_PACKET_TOO_BIG 
pptr for ICMP_PARAMPROB
but low risk as these are one line substitutions with tested primitives, 
and nextmtu is always changed.

- translate quoted address family (af-to)
via pf_change_icmp_af pf_change_ap

Testing af-to 4 - 6 
pass in on ral0 inet from any to 192.168.2.2 af-to inet6 from 
  fec0:0:0:2::1 to fec0:0:0:2::2

   af-to TCP
- inet4 telnet to inet6 netcat host [TESTED]
   af-to ICMP
- unassociated with connection
 - src ICMP4 ping [TESTED]
- associated with connection
 - traceroute elicits ICMP6 port unreachable
   from dest; translated result inspected with wireshark [TESTED]

Testing af-to 6 - 4 
pass in on vr0 inet6 from any to 64:ff9b::/96 af-to inet from 192.168.2.3

   af-to TCP
- inet6 telnet to inet4 netcat host [TESTED]
   af-to ICMP
- unassociated with connection
 - src ICMP6 ping [TESTED]
- associated with connection
 - traceroute6 elicits ICMP4 port unreachable
   from dest; translated result inspected with wireshark. [TESTED]






Re: [patch] cleaner checksum modification for pf

2015-06-16 Thread Richard Procter

On 17/06/2015, at 4:26 AM, Mike Belopuhov wrote:
 
 You didn't read the pf.conf manual page carefully:
 
   [...] Because address family translation
   overrides the routing table, it's only possible to use af-to on
   inbound rules, and a source address for the resulting
   translation must always be specified.
 
 And all example rules after that use pass in.

Thanks for the pointer, I'll have another go at testing this then. 

best, 
Richard. 




[patch] cleaner checksum modification for pf

2015-06-15 Thread Richard Procter
Hi,

These patches against HEAD re-instate the pf algorithm of OpenBSD 5.4
for preserving payload checksums end-to-end but rewritten without the
ugly and error-prone (but speedy!) nested pf_cksum_fixup calls.

I have been running this code on a small Alix (i386) IPv4 gateway for a
month with no obvious issues. To test as many of the affected features
as possible, its pf.conf included:

  match scrub (random-id) 
  match on egress scrub (max-mss 1440, reassemble tcp)
  match out on egress from !egress:network nat-to egress:0 
  pass out on egress inet proto tcp modulate state 
  pass in inet proto tcp from any to egress port ... rdr-to ...

I've tried to avoid significant performance impact on modern hardware,
and don't expect any, but have not tested this. I've aimed for
simplicity in the first instance and there is scope for optimisation if
necessary.

I've attached my test notes below, covering every change. Note:

 - I was unable to test af-to, which does a lot of packet fiddling.
   I've never used it before and was unable to get it working on a 
   generic kernel. I figure I'm just missing something. I used the line

   pass out on vr0 inet af-to inet6 from fec0:0:0:2::1 to fec0:0:0:2::2

   but although inet4 tcp connection attempts were translated 
   to fec0:0:0:2::2, its SYN replies received RST from the 
   router, fec0:0:02:::1.

 - My inet6 testing was limited to two nodes connected via the alix
   router.

 - I've assumed that rdr-to is good if nat-to tests ok as the code
   paths look virtually identical.

 - I haven't tested modification of unaligned TCP options,
   for SACK and timestamp, but have tested the unaligned paths of the
   change primitives.

 - the patch includes a small fix for pf_pdesc_setup to setup the
   pdesc protocol checksum for ICMPv6

The patch is in three commits to ease review, to be applied in order:

 0. rename pf_change_a - pf_change_32_unaligned to better reflect its
 use 
 1. reinstate pf_cksum_fixup sans nesting 
 2. avoid unnecessary calls to pf_change_32_unaligned

...but if another format is easier, let me know.

This patch should be examined closely --- it's been a while since I've
worked at this level and I've never worked on OpenBSD code.

I'm keen to hear comments, questions or criticisms.

best, 
Richard.  

 - [G] means that errors here are expected 
to have been exposed in the course of running on the gateway. 

 - [O] means these changes involve primitives tested elsewhere.

pf_tcp_track_full, pf_create_state [G]
- modulate sequence number  (modulate state)
 pf_change_a, old unaligned copy - pf_change_32
 new pf_change_a

pf_translate 
- translate address and port, if any 
AF_INET [G]
AF_INET6 [TESTED between two addresses UDP, TCP, ICMP6]
 
- translate ICMP icmp_id for ICMP_ECHO [TESTED]
pf_change_16
- translate ICMP6 icmp6_id for ICMP6_ECHO [TESTED]
pf_change_16
- translate address for non TCP,UDP,ICMP,ICMP6 protocol [TESTED]

 pf_change_a, old unaligned copy - pf_change_32
new pf_change_a

pf_modulate_sack 
- modulate SACK sequence numbers [G]
pf_change_32_unaligned [need to test unaligned options]
minor refactoring to support fixup

pf_test_state_icmp
- translate ICMP unrelated to another connection, e.g ECHO [TESTED]
- translate icmp_id
- translate address
new pf_change_a 

- ICMP error related to another connection, e.g dest unreachable
- translate address, quoted address, port (nat, rdr-to) 
via pf_change_icmp, see this 

- translate quoted address family (af-to) [*]
via pf_change_icmp_af
pf_change_ap

TCP 
   - demodulate quoted TCP sequence number [TESTED]
pf_change_a, old unaligned - pf_change_32
UDP [TESTED]
   - zero quoted UDP checksum
pf_change_16

pf_change_icmp [O]
- change quoted protocol port and address, if any 
- change outer ip address 
pf_cksum_fixup_a
pf_cksum_fixup
pf_change_a   

pf_change_icmp_af
- replace quoted IPv4 / IPv6 headers with converse AF. 
pf_cksum_cover / uncover

pf_translate_icmp_af [O] 
- to AF_INET
- sets icmp type, code
  nextmtu for ICMP6_PACKET_TOO_BIG
  pptr for ICMP_PARAMPROB
- to AF_INET6
- sets icmp type, code
  nextmtu for ICMP_UNREACH_NEEDFRAG
  ptr for ICMP_PARAMPROB

pf_change_8 [TESTED when testing alternate 

[Patch] pf refactoring

2015-08-16 Thread Richard Procter
Hi, 

This series of 29 small diffs slims pf.o by 2640 bytes and pf.c by
113 non-comment lines.

pf_translate(), in particular, is now much shorter and clearer[0].

I've tested it by running on my home router (alix, i386, inet4) for a
week or so without issue, and by profiling it under stress. I saw no
significant performance impact, and it's, possibly, even a little more
efficient. See below for the profile details.

The patch's length is due to its conservatism; the cumulative patch 
is much smaller.

At each step, I attempted a correctness argument and some of these follow
the calculational approach of Dijkstra and others. For background, see
e.g. EWD1300 and EWD1123 (for the interested, I've found A.J.M van
Gasteren's 'On the shape of mathematical arguments' a stellar read, and
also recommend EWD1023). I've found it a useful exercise, particularly
as the effort needed to make the argument indicates the effort required
to review the diff, and several times I was led to take smaller bites.


best, 
Richard. 

PS. If you, reading this, like the patch, or even if you don't, 
please drop me a line to let me know. Thanks! 

[0] As pf_translate() is called only on state creation it isn't
terribly performance sensitive, either.

Profiling -

Setup: netcat a 415MB file through the profiled router. 

  [A] -- 192.168.3.2 vr1 -- [router] -- vr0 192.168.1.2 -- [B]

(vr0 is part of a bridge.)

pf.conf contains
match on vr1 scrub (random-id, reassemble tcp)
match out on vr0 inet from !vr0:network nat-to $router
pass

before patch: real 1m18.118s 
 after patch: real 1m17.282s, 1m18.150s, 1m17.716s

before patch:

1.044.92 1303400/1303400pf_test [9]
[13] 5.91.044.92 1303400pf_test_state [13]
0.892.10 1301332/1301332pf_tcp_track_full [19]
0.360.69 1303400/1304361pf_find_state [43]
0.290.06 1301273/1951905m_copyback [63]
0.200.00 3257446/5872085pf_addrcpy [74]
0.080.08  650646/650647 pf_change_a [93]
0.100.06  650646/650647 pf_change_ap [95]

after patch:
(note, profiler spent less time in the idle loop before the test began, 
so the time percentages 5.9, 6.9 aren't comparable between these runs.)

0.964.88 1301679/1301679 pf_test [7]
[13] 6.90.964.88 1301679 pf_test_state [13]
0.891.99 1301628/1301628 pf_tcp_track_full [19]
0.390.59 1301679/1301680 pf_find_state [44]
0.190.16 1301580/1301582 pf_change_a [74]
0.280.05 1301576/1952366 m_copyback [64]
0.130.06 1301580/1301582 pf_change_16 [87]
0.150.00 2603358/5857533 pf_addrcpy [75]

Patch 

To apply against HEAD (pf.c:1.935, pfvar.h:1.419):

First apply the checksum patch I posted to tech@ 
Re: [patch] cleaner checksum modification for pf, 
Message-Id: 98bb5cd8-7b74-4a79-a333-70749e3bc...@gmail.com. 
(I'm also happy to regenerate what follows directly against HEAD 
if necessary.)

then, 

# cd /usr/src/sys
# cat - | patch 

- add pf_change_p(), change port, and use instead of pf_change_ap() when
 changing port only.

No functional change: all replaced pf_change_ap() leave address unchanged.

Argument:

(note: afto == (pd-af != nk-af))

All replaced pf_change_ap() executed under afto with arg naf := nk-af   
=   { pf_change_ap() } 
All replaced executions of pf_change_a() executed 
with args af := pd-af, naf := nk-af, under afto which implies af != naf 
=   { afto and pf_change_a() guarded with 'if (af != naf) return;' }
All replaced executions of pf_change_a() are no-op 
=   { pf_change_ap() } 
All replaced pf_change_ap() leave address unchanged

[Context=15]  

Index: net/pf.c
===
--- net.orig/pf.c
+++ net/pf.c
@@ -139,30 +139,31 @@ union pf_headers {

struct pool  pf_src_tree_pl, pf_rule_pl, pf_queue_pl;
struct pool  pf_state_pl, pf_state_key_pl, pf_state_item_pl;
struct pool  pf_rule_item_pl, pf_sn_item_pl;

void pf_init_threshold(struct pf_threshold *, u_int32_t,
u_int32_t);
void pf_add_threshold(struct pf_threshold *);
int  pf_check_threshold(struct pf_threshold *);
void pf_cksum_fixup(u_int16_t *, u_int16_t, u_int16_t,
u_int8_t);
void pf_cksum_fixup_a(u_int16_t *, const struct pf_addr *,
const struct pf_addr *, sa_family_t, u_int8_t);
void pf_change_32(struct pf_pdesc *, u_int32_t *,
 

Re: [patch] cleaner checksum modification for pf

2015-07-18 Thread Richard Procter
Hi, 

On 16/06/2015, at 1:09 PM, Richard Procter wrote:
 - I was unable to test af-to, which does a lot of packet fiddling.
 I've now tested this without obvious issue. 

I neglected checksum regeneration within icmp af-to, which masked a 
couple of icmp af-to errata in my last patch.

I've re-included the entire patch refreshed against HEAD below. 
(Thanks to whoever mentioned 'quilt' the other day!) 

Two further diffs then 0) fix the errata and 1) reintroduce checksum 
modification for icmp af-to. 

I see no remaining regeneration cases in PF. 

Note: Checksumless IPv4 UDP packets, illegal under IPv6, are now 
checksummed on af-to IPv6. This improves on HEAD. 

Note: pf_translate_af() flushes pd-pcksum to mbuf by flushing the  
entire transport header. Simple but possibly suboptimal; you may
wish to do it another way.  

testing: 

$4 IPv4 - $6 IPv6 
TCP:ssh $4 -- af-to $6 [good]
ICMPv4-v6: ping $4 -- af-to $6 [good]
UDP, ICMPv6-v4 quoting UDP: traceroute $4 -- af-to $6 [good] 
Checksumless UDP:   traceroute -x $4 -- af-to $6 [good] 

$6 IPv6 - $4 IPv4
TCP:ssh $6 -- af-to $4 [good]
ICMPv6: ping6 $6 -- af-to $4 [good]
UDP, ICMPv4-v6 quoting UDP: traceroute6 $6 -- af-to $4 [good]

best, 
Richard. 

To apply: 
# cd /src/sys/net
# cat - | patch 

--- Rename pf_change_a() - pf_change_32_unaligned() to 
prepare for address-specific pf_change_a()

Index: net/pf.c
===
--- net.orig/pf.c
+++ net/pf.c
@@ -1664,7 +1664,7 @@ pf_change_ap(struct pf_pdesc *pd, struct
 
 /* Changes a u_int32_t.  Uses a void * so there are no align restrictions */
 void
-pf_change_a(struct pf_pdesc *pd, void *a, u_int32_t an)
+pf_change_32_unaligned(struct pf_pdesc *pd, void *a, u_int32_t an)
 {
if (pd-csum_status == PF_CSUM_UNKNOWN)
pf_check_proto_cksum(pd, pd-off, pd-tot_len - pd-off,
@@ -2273,10 +2273,10 @@ pf_modulate_sack(struct pf_pdesc *pd, st
for (i = 2; i + TCPOLEN_SACK = olen;
i += TCPOLEN_SACK) {
memcpy(sack, opt[i], sizeof(sack));
-   pf_change_a(pd, sack.start,
+   pf_change_32_unaligned(pd, sack.start,
htonl(ntohl(sack.start) -
dst-seqdiff));
-   pf_change_a(pd, sack.end,
+   pf_change_32_unaligned(pd, sack.end,
htonl(ntohl(sack.end) -
dst-seqdiff));
memcpy(opt[i], sack, sizeof(sack));
@@ -3484,7 +3484,7 @@ pf_create_state(struct pf_pdesc *pd, str
if ((s-src.seqdiff = pf_tcp_iss(pd) - s-src.seqlo) ==
0)
s-src.seqdiff = 1;
-   pf_change_a(pd, th-th_seq,
+   pf_change_32_unaligned(pd, th-th_seq,
htonl(s-src.seqlo + s-src.seqdiff));
*rewrite = 1;
} else
@@ -3680,12 +3680,12 @@ pf_translate(struct pf_pdesc *pd, struct
 #endif /* INET6 */
} else {
if (PF_ANEQ(saddr, pd-src, pd-af)) {
-   pf_change_a(pd, pd-src-v4.s_addr,
+   pf_change_32_unaligned(pd, pd-src-v4.s_addr,
saddr-v4.s_addr);
rewrite = 1;
}
if (PF_ANEQ(daddr, pd-dst, pd-af)) {
-   pf_change_a(pd, pd-dst-v4.s_addr,
+   pf_change_32_unaligned(pd, pd-dst-v4.s_addr,
daddr-v4.s_addr);
rewrite = 1;
}
@@ -3745,12 +3745,12 @@ pf_translate(struct pf_pdesc *pd, struct
switch (pd-af) {
case AF_INET:
if (!afto  PF_ANEQ(saddr, pd-src, pd-af)) {
-   pf_change_a(pd, pd-src-v4.s_addr,
+   pf_change_32_unaligned(pd, pd-src-v4.s_addr,
saddr-v4.s_addr);
rewrite = 1;
}
if (!afto  PF_ANEQ(daddr, pd-dst, pd-af)) {
-   pf_change_a(pd, pd-dst-v4.s_addr,
+   pf_change_32_unaligned(pd, pd-dst-v4.s_addr,
daddr-v4.s_addr);
rewrite = 1;
}
@@ -3813,8 +3813,8 @@ pf_tcp_track_full(struct pf_pdesc *pd, s
while ((src-seqdiff = arc4random

[patch] tsec(4): enable TX interrupt coalescing

2015-11-09 Thread Richard Procter
Hi, 

This reduces tsec(4) TX interrupts by over a factor of four per interface,
boosting throughput by a couple of percent for

$ dd if=/dev/zero bs=4096 | nc ${host} ${port}

It does this by reducing TX interrupts notifications to one per frame, from
one per mbuf fragment, and by enabling TX interrupt coalescing.

I've chosen conservative coalescing parameters. The card now interrupts every
four tx frames, leaving the tx ring fuller on average. But ample room remains
on the card's tx ring of 256 descriptors, which can hold 16 frames in the 
worst case of 16 mbuf fragments per frame. Testing showed descriptor use 
peaking at 13 descriptors under load.

The hold-off timer, ensuring stale frames are not left on the tx ring 
indefinitely, is not crucial for tx: as the frame has already been transmitted, 
latency isn't a concern. It need only last longer than the time to transmit the 
coalesced frames, and I've set it much longer, roughly 2ms for 1000baseT, 
to give the stack some slack when feeding the card.

While here, also makes tsec_encap() error handling a tad more robust.

Tested on RB600A.

best, 
Richard. 

Index: if_tsec.c
===
RCS file: /cvs/src/sys/arch/socppc/dev/if_tsec.c,v
retrieving revision 1.39
diff -u -p -u -r1.39 if_tsec.c
--- if_tsec.c   6 Nov 2015 11:35:48 -   1.39
+++ if_tsec.c   10 Nov 2015 01:32:31 -
@@ -121,6 +121,8 @@ extern void myetheraddr(u_char *);
 #define TSEC_TCTRL 0x100
 #define TSEC_TSTAT 0x104
 #define  TSEC_TSTAT_THLT   0x8000
+#define TSEC_TXIC  0x110
+#define  TSEC_TXIC_ICEN0x8000
 #define TSEC_TBPTR 0x184
 #define TSEC_TBASE 0x204
 
@@ -536,7 +538,7 @@ tsec_start(struct ifnet *ifp)
ifp->if_flags |= IFF_OACTIVE;
break;
} 
-   if (error == EFBIG) {
+   if (error) {
IFQ_DEQUEUE(>if_snd, m);
m_freem(m); /* give up: drop it */
ifp->if_oerrors++;
@@ -1020,6 +1022,9 @@ tsec_up(struct tsec_softc *sc)
attr |= TSEC_ATTR_RBDSEN;
tsec_write(sc, TSEC_ATTR, attr);
 
+   /* TX interrupts every 4 TSEC_TX_I with ~2ms hold-off @ 1000baseT */
+   tsec_write(sc, TSEC_TXIC, (TSEC_TXIC_ICEN | (0x4 << 21) | 0x1000));
+
tsec_write(sc, TSEC_TSTAT, TSEC_TSTAT_THLT);
tsec_write(sc, TSEC_RSTAT, TSEC_RSTAT_QHLT);
 
@@ -1158,12 +1163,14 @@ tsec_encap(struct tsec_softc *sc, struct
for (i = 0; i < map->dm_nsegs; i++) {
status = txd->td_status & TSEC_TX_W;
status |= TSEC_TX_TO1;
+   status |= TSEC_TX_TC;
if (i == (map->dm_nsegs - 1))
-   status |= TSEC_TX_L;
+   status |= TSEC_TX_L | TSEC_TX_I;
+
txd->td_len = map->dm_segs[i].ds_len;
txd->td_addr = map->dm_segs[i].ds_addr;
__asm volatile("eieio" ::: "memory");
-   txd->td_status = status | TSEC_TX_R | TSEC_TX_I | TSEC_TX_TC;
+   txd->td_status = status | TSEC_TX_R;
 
bus_dmamap_sync(sc->sc_dmat, TSEC_DMA_MAP(sc->sc_txring),
frag * sizeof(*txd), sizeof(*txd), BUS_DMASYNC_PREWRITE);



Re: [PATCH] PF: cksum modification & refactor [3/24]

2015-09-01 Thread Richard Procter
Hi Sasha, 

On 1/09/2015, at 12:47 AM, Alexandr Nedvedicky wrote:
> the code in patch looks good for the first glance. However it seems to me
> the newly introduced pf_cksum_fixup*() are not called yet.  Do you think you
> can reshuffle changes between your set of patches a bit, so the newly
> introduced functions will become alive (get called)?
> 
> Also I think your patch 0/24, you've sent earlier, can be fold here (setting
> pd->pcksum to point to icmp6 header chksum field). 

Sure thing. I'm on a bit of a learning curve with this process. 

The attached is an amalgam of the following patches in my series:

   0) pd->pcksum = icmp6 header cksum
   3) introduce pf_cksum_fixup*() 
   4) introduce pf_change_*() 
  22) misc. transitions to pf_change_*() interface

The new ones include my notes below.

Also, I'd like to explain my last patch adding 'const' to a line deleted
by the next. My aim is to ease review by using the compiler to establish
universals where possible. That helps to eliminate programmer error due to
'not looking hard enough at the code'. I also find it easier on my mental
health :)

You won't, though, want to burn CVS commits on patches that exist only to
show my 'working', and in future I'll bundle these in one email for commitment
as one cumulative patch.

* Introduce pf_change_{8,16,16_unaligned,32}() interface

- modification of checksum-covered data is 
  assignment-with-side-effects. 
- all new functions will be used by later patches

  +ve provides type-appropriate checksum modification
  +ve will replace existing 'altered value' guards, 
  reducing code length
  -ve five new functions in total

C assignment hides behind one assignment operator the nitty gritty of differing
l-value widths. As we cannot change the language to suit our needs, we are
obliged to expose these differences in our interface.

An added wrinkle is that our side-effect, namely, modifying the checksum,
depends on the alignment of the l-value within the packet (the checksum's
summands are 16-bit aligned with respect to the packet). The interface therefore
provides _unaligned() versions parameterised by the l-value's packet alignment,
either 'hi' or 'lo'. Thankfully, these are for most protocol fields unnecessary.

Later patches will augment these functions with 'altered value' guards,
allowing us to replace, e.g.

if (icmpid != pd->hdr.icmp->icmp_id) {
if (pd->csum_status == PF_CSUM_UNKNOWN)
pf_check_proto_cksum(pd, pd->off,
pd->tot_len - pd->off, pd->proto,
pd->af);
pd->hdr.icmp->icmp_id = icmpid;
rewrite = 1;
}

with
rewrite += pf_change_16(pd, >hdr.icmp->icmp_id, icmpid);

And, despite the new code, by the end of this series pf is net ~ 500 bytes 
shorter (i386).

Lastly, mikeb@ suggested the name pf_patch_{...}(), which is both shorter and
more descriptive. I like it, however we follow the preexisting 'change' 
terminology 
for consistency, also noting that 'match' and 'patch' may be, possibly, 
undesirably
similar. But if you'd rather this patch use pf_patch, I'd be happy to reissue 
it. 

* Convert miscellaneous packet modification to pf_change_*() interface

As pf_change_*() modifies the checksum, the checksum status 
needn't be computed: that is used only to decide whether we may regenerate the
checksum later on in pf_cksum() to account for the altered packet. (Other 
parts of the code do not appear to depend on these removed checks.)

Testing: Same code as in my email "[patch] cleaner checksum modification for 
pf", 
see my testing notes there.

Just to be sure/anal retentive, I have retested the more involved changes: 

- normalize tcp->thx2 -> 0 (via patched hping3) 
- mss clamping (both aligned and unaligned via customised OpenBSD)
and also 
- NAT of icmpv4 ping

This was on i386; I haven't tested on an architecture with stronger alignment 
requirements but expect no problems. 

best, 
Richard. 

Index: net/pf.c
===
--- net.orig/pf.c
+++ net/pf.c
@@ -145,7 +145,10 @@ voidpf_init_threshold(struct 
pf_thre
u_int32_t);
 voidpf_add_threshold(struct pf_threshold *);
 int pf_check_threshold(struct pf_threshold *);
-
+voidpf_cksum_fixup(u_int16_t *, u_int16_t, u_int16_t,
+   u_int8_t);
+voidpf_cksum_fixup_a(u_int16_t *, const struct pf_addr *,
+   const struct pf_addr *, sa_family_t, u_int8_t);
 voidpf_change_ap(struct pf_pdesc *, struct pf_addr *,
u_int16_t *, struct pf_addr *, u_int16_t,
sa_family_t);
@@ -286,6 +289,8 @@ static __inline int 

Re: [PATCH] PF: cksum modification & refactor [3/24]

2015-09-02 Thread Richard Procter

On 2/09/2015, at 4:02 PM, Richard Procter wrote:
> 
> Testing: Same code as in my email "[patch] cleaner checksum modification for 
> pf", 
> see my testing notes there.
> 
> Just to be sure/anal retentive, I have retested the more involved changes: 

But anal retentivity is no substitute for actually thinking...

I forgot about the pf_cksum() calls. These regenerate the checksum, effacing
the effect of any pf_cksum_fixup(). So this patch is only half 'live', and
these tests aren't conclusive.

And this patch won't be unquestionably live until every pf_cksum() is removed.
But they're necessary in the interim for as-yet-unconverted packet modification
points. I guess one could jump through hoops to construct pf_cksum()-free
code paths and a pf.conf that sticks to them. Though as there's no way to
empirically distinguish correct modification from regeneration, besides
manually injecting errors, testing checksum modification prior to the removal
of every pf_cksum() doesn't look worth the trouble.

Alternatively, one could use the complete patch I posted as a proof-of-concept 
in a private tree, or for that matter just unit-test the pf_change() calls. I 
think I'll rustle up some of the later.

best, 
Richard. 












Re: [PATCH] PF: cksum modification & refactor [3/24]

2015-09-10 Thread Richard Procter

On 3/09/2015, at 10:41 AM, Richard Procter wrote:
> [...] or for that matter just unit-test the pf_change() calls. I 
> think I'll rustle up some of the later.

That was a useful exercise. The pf_change_*() code is fine. 
However, pf_cksum_fixup_a() reliably faulted a couple of times 
over 2E7 random address, so it was back to the drawing board. 

It turns out that 5.3's checksum modification algorithm is much 
hairer than it appears and I was fooling myself I understood it; 
the argument I provided it correct, it just doesn't establish the 
conclusion! The lesson is I guess not to re-purpose highly 
optimised code; instead refine something simple.

I've trialed four alternatives on three architectures and will 
cover this in a later patch. In the meantime, unittest code 
can be found here[0].

The attached against HEAD is the same as my last but does 
not include pf_cksum_fixup_a() and adds some comments to 
pf_cksum_fixup().

  0) initialise pd->pcksum = icmp6 header cksum
  3) re-introduce pf_cksum_fixup() (but not pf_cksum_fixup_a()
  4) introduce pf_change_*() 
 22) misc. transitions to pf_change_*() interface

I will be offline for a couple of days. 

best, 
Richard. 


[0] https://203.79.107.124/unittest/unittest.tar.gz


Index: net/pf.c
===
--- net.orig/pf.c
+++ net/pf.c
@@ -145,7 +145,8 @@ void pf_init_threshold(struct 
pf_thre
u_int32_t);
 voidpf_add_threshold(struct pf_threshold *);
 int pf_check_threshold(struct pf_threshold *);
-
+voidpf_cksum_fixup(u_int16_t *, u_int16_t, u_int16_t,
+   u_int8_t);
 voidpf_change_ap(struct pf_pdesc *, struct pf_addr *,
u_int16_t *, struct pf_addr *, u_int16_t,
sa_family_t);
@@ -1651,6 +1652,105 @@ pf_addr_wrap_neq(struct pf_addr_wrap *aw
}
 }
 
+/* This function is deceptively simple, but is in fact an incredibly hairy
+ * optimised special case of an algorithm to emulates 16-bit ones-complement
+ * arithmetic by conserving its carries, which twos-complement otherwise
+ * discards, in the upper 16 bits of x. These accumulated carries when added
+ * to the lower 16-bits over a number of 'reductions' then complete the
+ * ones-complement sum.
+ *
+ * It uses a trick to eliminate a reduction step by emulating in
+ * twos-complement ones-complement subtraction for a single u_int16_t,
+ * thereby reducing the number of accumulated carries to at most one, and
+ * saving one each of +, >>, and ~.
+ *
+ * The emulation works as follows: subtracting exactly one u_int16_t from x
+ * incurs at most one net underflow, wrapping the accumulator to 2^16 - 1,
+ * which, when folded back into the 16-bit sum preserves the ones-complement
+ * borrow:
+ *
+ *((x >> 16) + (x & 0x)) & 0x
+ *   = { x mod 2^n  =  x & (2^n - 1) }
+ *((x >> 16) + (x mod 2^16)) mod 2^16
+ *   = { def sum, def accumulator }
+ *(accumulator + sum) mod 2^16
+ *   = { one underflow, accumulator := 2^16 - 1 }
+ *((2^16 - 1) + sum) mod 2^16
+ *   = { mod }
+ *(sum - 1) mod 2^16
+ *
+ * The last step accounts for the reduction's trailing '& 0x' (which
+ * would, in general, destroy accumulated carries, but not here as a single
+ * add requires at most one reduction).
+ *
+ * And although this breaks for sum = 0, giving 0x, which is
+ * ones-complement's other zero, not -1, that cannot occur: underflowing the
+ * sum to zero requires subtracting at least 2^16, which exceeds a single
+ * u_int16_t.
+ */
+void
+pf_cksum_fixup(u_int16_t *cksum, u_int16_t was, u_int16_t now,
+u_int8_t proto)
+{
+   u_int32_t x;
+   const int udp = proto == IPPROTO_UDP;
+
+   if (udp && *cksum == 0x)
+   return;
+
+   x = *cksum + was - now;
+   x = ((x >> 16) + (x & 0x)) & 0x;
+
+   if (udp && x == 0x)
+   x = 0x;
+
+*cksum = (u_int16_t)(x);
+}
+
+void
+pf_change_8(struct pf_pdesc *pd, u_int8_t *f, u_int8_t v, bool hi)
+{
+   u_int16_t new = htons(hi ? ( v << 8) :  v);
+   u_int16_t old = htons(hi ? (*f << 8) : *f);
+
+   pf_cksum_fixup(pd->pcksum, old, new, pd->proto);
+   *f = v;
+}
+
+/* pre: *f is 16-bit aligned within its packet */
+void
+pf_change_16(struct pf_pdesc *pd, u_int16_t *f, u_int16_t v)
+{
+   pf_cksum_fixup(pd->pcksum, *f, v, pd->proto);
+   *f = v;
+}
+
+void
+pf_change_16_unaligned(struct pf_pdesc *pd, void *f, u_int16_t v, bool hi)
+{
+   u_int8_t *fb = (u_int8_t*)f;
+   u_int8_t *vb = (u_int8_t*)
+
+   if (hi && ALIGNED_POINTER(f, u_int16_t)) {
+   pf_change_16(pd, f, v); /* optimise */
+   

Re: [patch] cleaner checksum modification for pf

2015-09-30 Thread Richard Procter

On 29/09/2015, at 11:14 PM, Alexandr Nedvedicky wrote:

> Hello,
> 
> I've tried Richard's patch on sparc. I took a brief look at its source code.
> It's essentially what PF is doing on Solaris.

Thanks for doing that.

Just for the record, I look at PF's patched checksum handling as follows.

Before Henning's work, the checksum computation was spread (pseudo-header)
over different parts of the code and PF would face partially checksummed
packets. But there was no state indicating what the checksum covered, so when
PF altered the packet it could not determine whether or not it should also
modify the checksum, resulting in the rdr-to-localhost issue.

This problem goes away now that checksums either cover what they ought or are
flagged for computation. Now when PF alters a packet it need only modify the
checksum as appropriate for the protocol. And although that's unnecessary
when the checksum is as-yet uncomputed, it does no harm.

> The approach we take on Solaris is as follows:
> [...]

> The things are getting pretty wild in 2b, when PF is doing PBR (policy based
> routing) on outbound packets. Consider situation when IP stack routes packet
> via NIC, which is able to calculate chksum in HW.  IP stack sets flags and
> fields and passes packet to PF. PF changes interface, where packet is bound 
> to,
> to NIC, which is not able to calculate checksum, so the HW-cksum flags set by
> IP stack are no longer valid. In this case we always revert to calculation
> in SW.

As I see it in OpenBSD (maybe Solaris differs) the M_*_CSUM_OUT flags decouple
checksum policy ("whether to") from checksum mechanism ("how to"). So I don't
think of these as HW-cksum flags but as indicating that a needed checksum is
as-yet uncomputed. As that's not specific to an interface, they cannot become 
invalid if the output interface changes.

What they allow is the choice of checksumming method, whether by software or
by harware (or by small furry animals trained in ones-complement addition),
to be left until late in the output path when the capabilities of the output
interface are known.

> I currently have small suggestion to improve Richard's patch. The macro in
> PF_ALGNMNT() in pfvar.h uses modulo:
> 
>#define PF_HI (true)
>#define PF_LO (!PF_HI)
>#define PF_ALGNMNT(off) (((off) % 2) == 0 ? PF_HI : PF_LO)
> 
> I think we can get away with simple and operation (& 1), which will be faster
> than % on many platforms.
> 
>#define PF_ALGNMNT(off) (((off) & 1) == 0 ? PF_HI : PF_LO)

I've no strong feelings either way here. I note that gcc emits the same code
in either case on ppc and i386. 

best, 
Richard. 




Re: [patch] cleaner checksum modification for pf

2015-09-27 Thread Richard Procter

On 25/09/2015, at 9:33 PM, Stuart Henderson wrote:
> 
> [snip comment; I completely agree]
> 
>> Another (home) router I administer was seeing IIRC five [bad TCP checksums] a
>> day on average over 42 days, something we expect of a globe-spanning 
>> internetwork.
>> Passing one of these damaged segments to the user sufficies to break MACs 
>> and drop
>> secure connections.
> 
> While I do generally support this diff as long as it doesn't have
> a big negative impact on performance, the implication of mentioning
> this is that these are packets which PF would pass on to other
> hosts with a re-calculated checksum if the packets were modified
> (nat, scrub etc). But that's not the case because they would be
> checked on input to PF so wouldn't make it that far.

If I implied PF would mask these damaged packets I didn't mean to.  As you
say, PF would drop them when it tried to alter them (nat, scrub, etc).

Rather, the stats show that router faults can't be dismissed as irrelevant in
practice. And just one passed to the user in a secure stream will have a
significant impact, independently of its payload, by breaking the connection.

As to the patch, I have edited it for length and coherence but will hold off 
posting it for a week or two, as I hear there's much work going on elsewhere 
in the network stack. If anyone would like a copy in the meantime please 
contact me privately.

I don't expect a big performance hit, if any, as it works the same way as 5.3 
did
plus or minus a few function calls. Nor could I measure a difference netcatting
a largish file over 100BaseT via my Alix 2d2 running PF doing nat, scrub, etc. 
I'd appreciate more data or reports, positive or negative, though.

best, 
Richard. 
 
P.S. I earlier recommended EWD1023 from memory, that should have been EWD1036. 

 







Re: [patch] cleaner checksum modification for pf

2015-09-24 Thread Richard Procter

On 14/09/2015, at 11:51 PM, Henning Brauer wrote:
> * Martin Pieuchot  [2015-09-11 13:54]:
>> On 11/09/15(Fri) 13:28, Henning Brauer wrote:
>>> Ryan pointed me to this diff and we briefly discussed it; we remain
>>> convinced that the in-tree approach is better than this.
>> Could you elaborate why?
> 
> [ elaboration, quoted further down ] 
> 
> And given that, the approach that has less and simpler code and makes
> better use of offloading wins.

Well, I agree that less and simpler code is better, all else being equal. 
In fact if you'd like my opinion, the in-tree approach actually places a
greater burden of complexity on the programmer. The patch needs one line 
to alter a value

rewrite += pf_change_16(pd, >hdr.icmp->icmp_id, icmpid);

; the in-tree code requires all packet alterations to occur between 
two calls, analogous to the way locking must enclose a critical section:

if (pd->csum_status == PF_CSUM_UNKNOWN)
   pf_check_proto_cksum(pd, pd->off, pd->tot_len - pd->off,
   pd->proto, pd->af); 
   [...]
   header->field = new_value; 
   [...]
pf_cksum(pd, pd->m);

This is the stuff mild headaches are made of. Especially as 1) pf_cksum()
usually occurs elsewhere and as 2) the idiom tempts programmers to rely on
existing pf_check_proto_cksum() calls. This last occurs three places in the
existing code. I presume these are correct but it is much easier to avoid the
question by replacing them with the one-liner.

The new interface also provides a place to put all the altered-value guards,
replacing, e.g.:


if (icmpid != pd->hdr.icmp->icmp_id) {
if (pd->csum_status == PF_CSUM_UNKNOWN)
pf_check_proto_cksum(pd, pd->off,
pd->tot_len - pd->off, pd->proto,
pd->af);
pd->hdr.icmp->icmp_id = icmpid;
rewrite = 1;
}

with:

rewrite += pf_change_16(pd, >hdr.icmp->icmp_id, icmpid);

This saves code. Here are some numbers. First, non-comment[0] 
lines of code for all affected files:

pf.c:1.944   pfvar.h:1.420   pf_norm.c:1.182
current 5724 16131030 
patched 5682 16141028 
 
 -88   +1  -2

(amd64) pf.o  pf_norm.o
current 123520 bytes  30816 bytes
patched 123584 bytes  30912 bytes
- -
  +64   +96

(i386)  pf.o  pf_norm.o 
current 95904 bytes   21096 bytes
patched 94956 bytes   21156 bytes
- -
 -948   -60

The patch's numbers include the new pf_change_* interface and the checksum 
modification code. So the proposed interface pays its way. Ok, let's move 
on to handling checksums:

> Well we've been thru it more than once; the argument presented here
> was that modifying the cksum instead of verify + recalc is better as
> it wouldn't hide cksum mismatches if the cksum engines on the NICs we
> offload to misbehave. After many years with the verify + recalc
> approach I think it is pretty safe to say that this is of no
> concern...

I'm sorry, that's not what I'm saying. Even supposing perfection of every
offload engine, offload engines have no monopoly on faults; regenerated
checksums also mask bus, stack and memory faults in pf-altered packets.

Regeneration always hides faults in /something/, unfortunately. 
pf_check_proto_cksum() takes care to mitigate that by preventing it from
hiding faults in the routers a packet passes through --- except for the current
one, which it can't.

That is, either router faults are a concern or they're not. pf can't
have it both ways, concerned to detect faults in every router other than 
the one on which it is running. The proposed patch fixes that.

And, as I wrote in a previous post, we know that routers corrupt data:

>> Right now my home firewall shows 30 TCP segments dropped for bad checksums.
>> As checks at least as strong are used by every sane link-layer this
>> virtually implies the dropped packets suffered router or end-point faults.

Another (home) router I administer was seeing IIRC five a day on average over
42 days, something we expect of a globe-spanning internetwork. Passing one of
these damaged segments to the user sufficies to break MACs and drop secure
connections.

Nonetheless, one might still argue that the reduced reliability is acceptable.
If there were some compelling upside, perhaps, maybe it might be, but I've
seen no evidence of performance benefits and the proposed patch addresses the
code complexity from which the more reliable method suffered.


Lastly, I don't want to sound like I'm trying to beat up on your checksum
work. I know very well how much effort is involved and it's very much
appreciated; you've cleaned up the code considerably. The 

Re: [patch] tsec(4): enable TX interrupt coalescing

2015-12-03 Thread Richard Procter
Refreshed patch against HEAD appears below, 

best, 
Richard. 

PS. Apologies for the quoted-printable encoding...looking now for an email
client capable of 7-bit... 

On 10/11/2015, at 5:18 PM, Richard Procter wrote:

> This reduces tsec(4) TX interrupts by over a factor of four per interface,
> boosting throughput by a couple of percent for
> 
>   $ dd if=/dev/zero bs=4096 | nc ${host} ${port}
> 
> It does this by reducing TX interrupts notifications to one per frame, from
> one per mbuf fragment, and by enabling TX interrupt coalescing.
> 
> I've chosen conservative coalescing parameters. The card now interrupts every
> four tx frames, leaving the tx ring fuller on average. But ample room remains
> on the card's tx ring of 256 descriptors, which can hold 16 frames in the 
> worst case of 16 mbuf fragments per frame. Testing showed descriptor use 
> peaking at 13 descriptors under load.
> 
> The hold-off timer, ensuring stale frames are not left on the tx ring 
> indefinitely, is not crucial for tx: as the frame has already been 
> transmitted, 
> latency isn't a concern. It need only last longer than the time to transmit 
> the 
> coalesced frames, and I've set it much longer, roughly 2ms for 1000baseT, 
> to give the stack some slack when feeding the card.
> 
> While here, also makes tsec_encap() error handling a tad more robust.
> 
> Tested on RB600A.

Index: arch/socppc/dev/if_tsec.c
===
RCS file: /cvs/src/sys/arch/socppc/dev/if_tsec.c,v
retrieving revision 1.42
diff -u -p -U6 -r1.42 if_tsec.c
--- arch/socppc/dev/if_tsec.c   25 Nov 2015 03:09:58 -  1.42
+++ arch/socppc/dev/if_tsec.c   3 Dec 2015 20:54:39 -
@@ -117,12 +117,14 @@ extern void myetheraddr(u_char *);
#define  TSEC_DMACTRL_WOP   0x0001
#define TSEC_TBIPA  0x030

#define TSEC_TCTRL  0x100
#define TSEC_TSTAT  0x104
#define  TSEC_TSTAT_THLT0x8000
+#define TSEC_TXIC  0x110
+#define  TSEC_TXIC_ICEN0x8000
#define TSEC_TBPTR  0x184
#define TSEC_TBASE  0x204

#define TSEC_RCTRL  0x300
#define  TSEC_RCTRL_PROM0x0008
#define TSEC_RSTAT  0x304
@@ -533,13 +535,13 @@ tsec_start(struct ifnet *ifp)
error = tsec_encap(sc, m, );
if (error == ENOBUFS) {
ifq_deq_rollback(>if_snd, m);
ifq_set_oactive(>if_snd);
break;
} 
-   if (error == EFBIG) {
+   if (error) {
ifq_deq_commit(>if_snd, m);
m_freem(m); /* give up: drop it */
ifp->if_oerrors++;
continue;
}

@@ -1017,12 +1019,15 @@ tsec_up(struct tsec_softc *sc)

attr = tsec_read(sc, TSEC_ATTR);
attr |= TSEC_ATTR_RDSEN;
attr |= TSEC_ATTR_RBDSEN;
tsec_write(sc, TSEC_ATTR, attr);

+   /* TX interrupts every 4 TSEC_TX_I with ~2ms hold-off @ 1000baseT */
+   tsec_write(sc, TSEC_TXIC, (TSEC_TXIC_ICEN | (0x4 << 21) | 0x1000));
+
tsec_write(sc, TSEC_TSTAT, TSEC_TSTAT_THLT);
tsec_write(sc, TSEC_RSTAT, TSEC_RSTAT_QHLT);

/* Configure media. */
if (LIST_FIRST(>sc_mii.mii_phys))
mii_mediachg(>sc_mii);
@@ -1156,18 +1161,20 @@ tsec_encap(struct tsec_softc *sc, struct
BUS_DMASYNC_PREWRITE);

txd = >sc_txdesc[frag];
for (i = 0; i < map->dm_nsegs; i++) {
status = txd->td_status & TSEC_TX_W;
status |= TSEC_TX_TO1;
+   status |= TSEC_TX_TC;
if (i == (map->dm_nsegs - 1))
-   status |= TSEC_TX_L;
+   status |= TSEC_TX_L | TSEC_TX_I;
+
txd->td_len = map->dm_segs[i].ds_len;
txd->td_addr = map->dm_segs[i].ds_addr;
__asm volatile("eieio" ::: "memory");
-   txd->td_status = status | TSEC_TX_R | TSEC_TX_I | TSEC_TX_TC;
+   txd->td_status = status | TSEC_TX_R;

bus_dmamap_sync(sc->sc_dmat, TSEC_DMA_MAP(sc->sc_txring),
frag * sizeof(*txd), sizeof(*txd), BUS_DMASYNC_PREWRITE);

cur = frag;
if (status & TSEC_TX_W) {



Re: removing expired once rules in pf_purge_thread()

2015-12-13 Thread Richard Procter

If I understand this patch:

Rule removal requires the consistency lock but to acquire it one must be 
able to sleep. pf_test_rule() cannot sleep as it executes in a (soft) 
interrupt handler, so it passes the expired rule to a thread which can.

However, the code to remove a 'once' rule may wish to also remove its 
parent rule, now computed in the other thread. The updated patch below 
avoids passing that state between threads in struct pf_rule by explictly 
passing the parent rule to be purged when necessary.

There also seem to me a couple of issues, one hairier than the other:

 - r->rule_flag should be updated atomically with respect to other  
   (potential) updates (included in the attached patch).
 
 - Multiple rule-matching threads may match the same 'once' rule before
   one of them reaches the code to set PFRULE_EXPIRED. A thread may even 
   match a 'once' rule after PFRULE_EXPIRED is set, if that occurs during 
   the 'once' rule's candidacy as a potential match. (This issue is not 
   addressed by the attached patch.)

   This creates two problems:

a) rules may be purged more than once
b) what to do with packets matched by expired 'once' rules

   In general, we do not know whether or not a 'once' rule candidate is
   the match until a new candidate rule replaces it or the rule matching 
   process completes. Serialising this window of candicacy would prevent a 
   'once' rule becoming a candidate, and possibly the match, for multiple 
   packets at once. That's one approach.

   A weaker approach instead permits multiple matches of one 'once' rule 
   but processes these serially in a critical section afterwards. The 
   first thread to reach it is deemed the first match (note this isn't 
   necessarily the first matching packet received at the interface!). It 
   then atomically marks it PFRULE_EXPIRED and places it on the purge 
   queue. Packets for matches of expired 'once' rule, it seems to me, can 
   be dropped as if we never saw them. An alternative might be to retry 
   them against the updated rules.

   Another possibility would be to require 'once' rules to be 'quick'. 
   This closes the candidacy window and makes its serialisation, to 
   preclude multiple matches, more feasible. 

   Yet another possibility is to drop 'once' rules as too complex to 
   implement for multiprocessor but I have no idea if this is viable.

best, 
Richard. 

Tested via pf.conf

set skip on lo
pass
block in proto icmp

anchor selfdestruct {
pass in proto icmp once no state 
}

Index: net/pf.c
===
--- net.orig/pf.c
+++ net/pf.c
@@ -295,6 +295,9 @@ RB_GENERATE(pf_state_tree, pf_state_key,
 RB_GENERATE(pf_state_tree_id, pf_state,
 entry_id, pf_state_compare_id);
 
+SLIST_HEAD(pf_rule_gcl, pf_rule)   pf_rule_gcl =
+   SLIST_HEAD_INITIALIZER(pf_rule_gcl);
+
 __inline int
 pf_addr_compare(struct pf_addr *a, struct pf_addr *b, sa_family_t af)
 {
@@ -1137,6 +1140,24 @@ pf_state_export(struct pfsync_state *sp,
 /* END state table stuff */
 
 void
+pf_purge_expired_rules(void)
+{
+   struct pf_rule  *r;
+
+   if (SLIST_EMPTY(_rule_gcl)) {
+   return;
+   }
+
+   rw_enter_write(_consistency_lock);
+   while ((r = SLIST_FIRST(_rule_gcl)) != NULL) {
+   SLIST_REMOVE(_rule_gcl, r, pf_rule, gcle);
+   KASSERT(r->rule_flag & PFRULE_EXPIRED);
+   pf_purge_rule(r);
+   }
+   rw_exit_write(_consistency_lock);
+}
+
+void
 pf_purge_thread(void *v)
 {
int nloops = 0, s;
@@ -1154,6 +1175,7 @@ pf_purge_thread(void *v)
if (++nloops >= pf_default_rule.timeout[PFTM_INTERVAL]) {
pf_purge_expired_fragments();
pf_purge_expired_src_nodes(0);
+   pf_purge_expired_rules();
nloops = 0;
}
 
@@ -3135,6 +3157,10 @@ pf_test_rule(struct pf_pdesc *pd, struct
ruleset = _main_ruleset;
r = TAILQ_FIRST(pf_main_ruleset.rules.active.ptr);
while (r != NULL) {
+   if (r->rule_flag & PFRULE_EXPIRED) {
+   r = TAILQ_NEXT(r, entries);
+   goto nextrule;
+   }
r->evaluations++;
PF_TEST_ATTRIB((pfi_kif_match(r->kif, pd->kif) == r->ifnot),
r->skip[PF_SKIP_IFP].ptr);
@@ -3433,8 +3459,18 @@ pf_test_rule(struct pf_pdesc *pd, struct
}
 #endif /* NPFSYNC > 0 */
 
-   if (r->rule_flag & PFRULE_ONCE)
-   pf_purge_rule(ruleset, r, aruleset, a);
+   if (r->rule_flag & PFRULE_ONCE) {
+   int only_rule = (TAILQ_NEXT(r, entries) == NULL &&
+TAILQ_PREV(r, pf_rulequeue, entries) == NULL);
+
+   if (only_rule && a) {
+   atomic_setbits_int(>rule_flag, PFRULE_EXPIRED);
+   

Re: removing expired once rules in pf_purge_thread()

2015-12-15 Thread Richard Procter

On Tue, 15 Dec 2015, Mike Belopuhov wrote:

> >Yet another possibility is to drop 'once' rules as too complex to
> >implement for multiprocessor but I have no idea if this is viable.
> 
> It is.  And I have said that before with an authority of the implementor
> of "once" rules: since we don't have any userland applications that
> would use this yet, we can ditch them for now and possibly devise a
> better approach later.
 
> Don't make your lives harder than they have to be!

I tend to agree! And I can't see a way to reimplement it for a 
multithreaded pf without introducing downsides.

Quoting pf.conf(5) "once - Creates a one shot rule that will remove itself 
from an active ruleset after the first match."

A 'first' match presupposes a total ordering. This comes for free when pf 
is single-threaded but concurrent rule matching must take the trouble to 
re-impose it somewhere. (I assume that pf aims to do concurrent matching 
of packets against the ruleset.)

One approach serialises the candidacy of 'once' rules during rule 
matching. This ensures an unambiguous first match but costs pf a latent 
glass jaw. In the worst case, a 'once' rule heads the ruleset and matches 
any packet. Its candidacy therefore covers the entire rule-matching 
process and so fully serialises it.
 
The other approach serialises the packets matched by 'once' rules, after 
rule processing is done. This doesn't create a glass jaw but does permit 
the 'once' rule to match more than once.

To my taste neither alternative is especially palatable.

That said, a 'quick' 'once' rule has no candidacy window so it can be 
serialised by atomically testing/setting a flag when evaluating it. This 
also both avoids the glass jaw and provides at most one first match. 

(Note that this issue of determining the first match is independent of 
whether a 'once' rule, having matched, is purged.)

On Wed, 16 Dec 2015, Mike Belopuhov wrote:

> It just occurred to me that another possibility would be a match-only 
> rule that matches one but doesn't involve any purging machinery.

That is an interesting idea. Like a 'quick' rule, a match rule has no 
candidacy window. However a 'once' match rule can no longer influence 
pass/block state --- I can't see how it is usefull? 

best, 
Richard. 



Re: removing expired once rules in pf_purge_thread()

2015-12-18 Thread Richard Procter

Hi Sasha, 

On Fri, 18 Dec 2015, Alexandr Nedvedicky wrote:

> > Right. I'll just note though that the patch as it stands allows 
> > multiple winners [...] Whether that's a realistic issue, I don't know. 
> > I have though been bitten by enough edge cases like this to be very 
> > wary of them.
> 
> I think it's not realistic with current PF at OpenBSD. The pf_test() function
> does not run concurrently, so there can be no such race.

Fair enough :) I presumed that an upcoming concurrent pf_test() would rely 
on this patch.

> FYI: PF@solaris uses mutex there to protect the insertion to gcl. Code goes as
> follows at Solaris:
> 
>   if (!(r->rule_flags & PFRULE_EXPIRED)) {
>   mutex_enter(_mutex);
>   /* retest under exclusive protection */
>   if (!(r->rule_flags & PFRULE_EXPIRED)) {
>   r->rule_flag |= PFRULE_EXPIRED;
>   SLIST_INSERT_HEAD(_rule_gcl, r, gcle);
>   }
>   }

I was wondering about that. For this patch presumably it's thought 
unnecessary to mutex the SLIST ops (or for that matter to preserve rule 
lifetime) for one-shot rules as the purge thread runs so infrequently.

BTW, the purge queue exists to pass a result between threads; an 
alternative is to recompute it in the purge thread by searching for 
expired rules: no need for the queue, its mutexes, etc. Easier to preserve 
rule lifetime, too. May need another anchor stack though. I might have a 
go next year when the code is more settled.

Ok, enough from me on this.

best, 
Richard. 



Re: socppc/fdt: fix fdt_find_node for the case we don't find a node

2016-02-28 Thread Richard Procter
Hi Patrick, 

On Sun, 28 Feb 2016, Patrick Wildt wrote:

> Hi,
> 
> If we're calling fdt_find_node() and do not actually find the node we're
> looking for, we call strncmp with a NULL value. 
> [...]

Tested on RB600A: boots fine, dmesg unchanged (mod banner). 

Patch applied to HEAD, fdt.c:1.12.

best, 
Richard. 

> diff --git sys/arch/socppc/socppc/fdt.c sys/arch/socppc/socppc/fdt.c
> index 0dec4fb..741763c 100644
> --- sys/arch/socppc/socppc/fdt.c
> +++ sys/arch/socppc/socppc/fdt.c
> @@ -274,6 +274,13 @@ fdt_find_node(char *name)
>  
>   for (child = fdt_child_node(node); child;
>child = fdt_next_node(child)) {
> + /*
> +  * A child always starts with a FDT_NODE_BEGIN token.
> +  * If it's another token, we have reached the end of
> +  * the list but have not found a match.
> +  */
> + if (betoh32(*(uint32_t *)child) != FDT_NODE_BEGIN)
> + return NULL;
>   if (strncmp(p, fdt_node_name(child), q - p) == 0) {
>   node = child;
>   break;
> 
> 



Re: socppc/fdt: fix end signature check (again)

2016-02-28 Thread Richard Procter
Hi Patrick, 

On Sun, 28 Feb 2016, Patrick Wildt wrote:

> Hi,
> 
> unfortunately the end signature check is still not correct.  Consulting
> the spec cleared the confusion of why the check does not work on my ARM
> machines. 
> [...]

Tested on RB600A: boots fine, dmesg unchanged (mod banner).

Note, I don't know if this exercises the code. My debug printf() calls do 
not show in the console ... which is unsurprising as the console is as-yet 
uninitialised (consinit()). 

Patch applied to HEAD, fdt.c:1.12.

best, 
Richard.

> diff --git sys/arch/socppc/socppc/fdt.c sys/arch/socppc/socppc/fdt.c
> index 0dec4fb..8535c33 100644
> --- sys/arch/socppc/socppc/fdt.c
> +++ sys/arch/socppc/socppc/fdt.c
> @@ -60,7 +60,8 @@ fdt_check_head(void *fdt)
>  
>   /* check for end signature on version 17 blob */
>   if ((betoh32(fh->fh_version) >= 17) &&
> - (betoh32(*(ptr + betoh32(fh->fh_struct_size))) != FDT_END))
> + (betoh32(*(ptr + (betoh32(fh->fh_struct_off) / 4) +
> + (betoh32(fh->fh_struct_size) / 4) - 1)) != FDT_END))
>   return 0;
>  
>   return betoh32(fh->fh_version);
> 
> 



Re: fdt: move code from socppc to sys/dev/fdt/

2016-03-02 Thread Richard Procter


On Wed, 2 Mar 2016, Patrick Wildt wrote:

> Hi,
> 
> to allow FDT to be used on ARM I think we should move the FDT code from
> socppc to sys/dev/fdt/.  Opinions?
> 
> Could someone test this diff for me?

Sure --- it compiles and boots on socppc (RB600A). I applied patch to HEAD 
and removed the empty /usr/src/sys/arch/socppc/socppc/fdt.c

best, 
Richard. 

> 
> Patrick
> 
> diff --git sys/arch/socppc/conf/files.socppc sys/arch/socppc/conf/files.socppc
> index d1fab5e..ade4e53 100644
> --- sys/arch/socppc/conf/files.socppc
> +++ sys/arch/socppc/conf/files.socppc
> @@ -15,7 +15,7 @@ filearch/socppc/socppc/machdep.c
>  file arch/socppc/socppc/mem.c
>  file dev/cninit.c
>  file arch/socppc/socppc/db_interface.c   ddb
> -file arch/socppc/socppc/fdt.c
> +file dev/fdt/fdt.c
>  file arch/socppc/socppc/n1200_dts.S
>  
>  
> diff --git sys/arch/socppc/include/fdt.h sys/arch/socppc/include/fdt.h
> deleted file mode 100644
> index 849f6b2..000
> --- sys/arch/socppc/include/fdt.h
> +++ /dev/null
> @@ -1,61 +0,0 @@
> -/*   $OpenBSD: fdt.h,v 1.3 2009/10/01 20:21:05 dms Exp $ */
> -
> -/*
> - * Copyright (c) 2009 Dariusz Swiderski 
> - *
> - * Permission to use, copy, modify, and distribute this software for any
> - * purpose with or without fee is hereby granted, provided that the above
> - * copyright notice and this permission notice appear in all copies.
> - *
> - * THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
> - * WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
> - * MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
> - * ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
> - * WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
> - * ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
> - * OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
> - */
> -
> -struct fdt_head {
> - u_int32_t fh_magic;
> - u_int32_t fh_size;
> - u_int32_t fh_struct_off;
> - u_int32_t fh_strings_off;
> - u_int32_t fh_reserve_off;
> - u_int32_t fh_version;
> - u_int32_t fh_comp_ver; /* last compatible version */
> - u_int32_t fh_boot_cpu_id;  /* fh_version >=2 */
> - u_int32_t fh_strings_size; /* fh_version >=3 */
> - u_int32_t fh_struct_size;  /* fh_version >=17 */
> -};
> -
> -struct fdt {
> - struct fdt_head *header;
> - void *  tree;
> - void *  strings;
> - void *  memory;
> - int version;
> - int strings_size;
> -};
> -
> -#define FDT_MAGIC0xd00dfeed
> -#define FDT_NODE_BEGIN   0x01
> -#define FDT_NODE_END 0x02
> -#define FDT_PROPERTY 0x03
> -#define FDT_NOP  0x04
> -#define FDT_END  0x09
> -
> -#define FDT_CODE_VERSION 0x11
> -
> -int   fdt_init(void *);
> -void *fdt_next_node(void *);
> -void *fdt_child_node(void *);
> -char *fdt_node_name(void *);
> -void *fdt_find_node(char *);
> -int   fdt_node_property(void *, char *, char **);
> -void *fdt_parent_node(void *);
> -#ifdef DEBUG
> -void *fdt_print_property(void *, int);
> -void  fdt_print_node(void *, int);
> -void  fdt_print_tree(void);
> -#endif
> diff --git sys/arch/socppc/socppc/fdt.c sys/arch/socppc/socppc/fdt.c
> deleted file mode 100644
> index 0dec4fb..000
> --- sys/arch/socppc/socppc/fdt.c
> +++ /dev/null
> @@ -1,482 +0,0 @@
> -/*   $OpenBSD: fdt.c,v 1.12 2016/02/28 12:39:40 mpi Exp $*/
> -
> -/*
> - * Copyright (c) 2009 Dariusz Swiderski 
> - * Copyright (c) 2009 Mark Kettenis 
> - *
> - * Permission to use, copy, modify, and distribute this software for any
> - * purpose with or without fee is hereby granted, provided that the above
> - * copyright notice and this permission notice appear in all copies.
> - *
> - * THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
> - * WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
> - * MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
> - * ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
> - * WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
> - * ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
> - * OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
> - */
> -
> -
> -#include 
> -#include 
> -#include 
> -
> -#include 
> -
> -#include 
> -
> -unsigned int fdt_check_head(void *);
> -char *fdt_get_str(u_int32_t);
> -void *skip_property(u_int32_t *);
> -void *skip_props(u_int32_t *);
> -void *skip_node_name(u_int32_t *);
> -void *fdt_parent_node_recurse(void *, void *);
> -#ifdef DEBUG
> -void  fdt_print_node_recurse(void *, int);
> -#endif
> -
> -static int tree_inited = 0;
> -static struct fdt tree;
> -
> -unsigned int
> -fdt_check_head(void *fdt)
> -{
> - struct fdt_head *fh;
> - u_int32_t 

[PATCH] [0/11] pf checksum modification / refactoring

2016-08-10 Thread Richard Procter

Here is the first step in the pf checksum modification / refactoring 
series.

The complete series is available at http://203.79.107.124/. It differs 
from what I presented at the hackathon only by a small optimisation [0].

Overview


The series is broken into two phases. 

Phase1 is a minimal patch, reintroducing the 5.3 checksum fixup algorithm 
that preserves end-to-end checksums when fiddling with packets, but 
without the mess that motivated Henning to remove it (this is my own 
motivation; there are others). The patch only affects this one aspect of 
Henning's checksum work, without which this patch would be far far uglier.

(Note: checksum modification will not be fully 'live' until regeneration 
is fully removed at the end of phase one.)

Phase2 builds on Phase1 and includes the refactorings I posted last year.

The complete patch shows no apparent performance regression, even slight 
improvement under small-packet DDOS over both hw offload and software 
regeneration. IIRC throughput for 10G ix(4), which lacks offload, 
increases by ~5%. It also shaves off 26 non-comment lines of code (and 
adds 376 bytes of object code on amd64; on i386 it prunes ~500 bytes).

See http://203.79.107.124/UNITTEST.tar for userspace unittests of the new 
fixup algorithm. 

I've been running the complete patch for some time without issue, as has 
sthen@, and possibly others.

Patch Mechanics 
---

The complete series patch totals 50kB; to ease review it is split into 35 
diffs with commentary. These diffs are grouped into 12 patches to be 
committed (each patch shares a three digit prefix, e.g. 000*.diff forms 
one patch). Each aims to leave the code in a consistent state. 

I post for each patch the diffs that comprise it. Following these is their 
sum, the complete patch to be committed. (You can check they're equivalent 
by feeding the entire post to patch(1) and, when it comes to the complete 
patch, agreeing to apply that in reverse. This should leave the source 
unaltered.)

Thanks
--

Thanks to everyone who has supported this work in one way or another, and 
special mention to reannz.co.nz for providing me with a 10Gb test harness. 
Any faults remain of course my own.

[0] small optimisation: see 
http://203.79.107.124/017b_pf_change_32_remove_udp.diff 


ok? 

- BEGIN DIFFS FOR PATCH 000 -
-
* Re-introduce pf_cksum_fixup()

- Same algorithm as in 5.3 but for one well-tested tweak, detailed 
  below.

Unlike 5.3, the checksum is passed by reference for concision, replacing

*pd->pcksum = pf_cksum_fixup(*pd->pcksum, ...)
with
pf_cksum_fixup(pd->pcksum, ...)

Note: although this precludes the compiler optimisations for nested calls of
pure functions, which 5.3 took advantage of for its nested (unreadable)
pf_cksum_fixup() chains, these optimisations are no longer relevant: with the
introduction of pf_cksum_fixup_a() below, at most two consecutive
pf_cksum_fixup() calls are needed. (EON)

Regards the fixup algorithm tweak: The OpenBSD 5.3 fixup was (in essence)

x = ((x & 0x) + (x >> 16)) & 0x;

the new, equivalent, line is:

x = (x + (x >> 16)) & 0x;

For justification, see source comments. 

* Introduce pf_patch_{8,16,16_unaligned,32}() interface

- modification of checksum-covered data is 
  assignment-with-side-effects. 
- all new functions will be used by later patches

 +ve provides type-appropriate checksum modification
 +ve will replace existing 'altered value' guards, 
 reducing code length
 -ve five new functions in total

C assignment hides behind one assignment operator the nitty gritty of
differing l-value widths. As we cannot change the language to suit our needs,
we are obliged to expose these differences in our interface.

An added wrinkle is that our side-effect, namely, modifying the checksum,
depends on the alignment of the l-value within the packet (the checksum's
summands are 16-bit aligned with respect to the packet). So the interface
provides _unaligned() versions parameterised by the l-value's packet
alignment, either 'hi' or 'lo'. Thankfully, these are for most protocol fields
unnecessary.  

Later patches will augment these functions with 'altered value' guards,
allowing us to replace, e.g.

if (icmpid != pd->hdr.icmp->icmp_id) {
if (pd->csum_status == PF_CSUM_UNKNOWN)
pf_check_proto_cksum(pd, pd->off,
pd->tot_len - pd->off, pd->proto,
pd->af);
pd->hdr.icmp->icmp_id = icmpid;
rewrite = 1;
}

with
rewrite += pf_patch_16(pd, >hdr.icmp->icmp_id, icmpid);

Lastly, thanks to mikeb@ for the name 'pf_patch_*'.

* Convert miscellaneous packet alteration to pf_patch_*() interface and
checksum modification.


Re: [PATCH] [0/11] pf checksum modification / refactoring

2016-08-16 Thread Richard Procter


On Tue, 9 Aug 2016, Richard Procter wrote:

> 
> Here is the first step in the pf checksum modification / refactoring 
> series.
> 
> The complete series is available at http://203.79.107.124/. It differs 
> from what I presented at the hackathon only by a small optimisation [0].

Change of plan. Here is the complete phase one (checksum modification).
It's easier to test, and easier to commit. The easier-to-review
steps remain up at http://203.79.107.124/ 

ok?

best, 
Richard. 

Index: net/pf.c
===
--- net.orig/pf.c
+++ net/pf.c
@@ -150,21 +150,28 @@ void   pf_init_threshold(struct 
pf_thre
u_int32_t);
 voidpf_add_threshold(struct pf_threshold *);
 int pf_check_threshold(struct pf_threshold *);
-
-voidpf_change_ap(struct pf_pdesc *, struct pf_addr *,
+int pf_check_tcp_cksum(struct mbuf *, int, int,
+   sa_family_t);
+static __inline voidpf_cksum_fixup(u_int16_t *, u_int16_t, u_int16_t,
+   u_int8_t);
+voidpf_translate_ap(struct pf_pdesc *, struct pf_addr *,
u_int16_t *, struct pf_addr *, u_int16_t);
+voidpf_cksum_fixup_a(u_int16_t *, const struct pf_addr *,
+   const struct pf_addr *, sa_family_t, u_int8_t);
 int pf_modulate_sack(struct pf_pdesc *,
struct pf_state_peer *);
 int pf_icmp_mapping(struct pf_pdesc *, u_int8_t, int *,
u_int16_t *, u_int16_t *);
-voidpf_change_icmp(struct pf_pdesc *, struct pf_addr *,
-   u_int16_t *, struct pf_addr *, struct pf_addr *,
-   u_int16_t);
 int pf_change_icmp_af(struct mbuf *, int,
struct pf_pdesc *, struct pf_pdesc *,
struct pf_addr *, struct pf_addr *, sa_family_t,
sa_family_t);
-int pf_translate_icmp_af(int, void *);
+voidpf_translate_a(struct pf_pdesc *, struct pf_addr *,
+   struct pf_addr *);
+voidpf_translate_icmp(struct pf_pdesc *, struct pf_addr *,
+   u_int16_t *, struct pf_addr *, struct pf_addr *,
+   u_int16_t);
+int pf_translate_icmp_af(struct pf_pdesc*, int, void *);
 voidpf_send_tcp(const struct pf_rule *, sa_family_t,
const struct pf_addr *, const struct pf_addr *,
u_int16_t, u_int16_t, u_int32_t, u_int32_t,
@@ -293,6 +300,8 @@ static __inline int pf_state_compare_key
struct pf_state_key *);
 static __inline int pf_state_compare_id(struct pf_state *,
struct pf_state *);
+static __inline void pf_cksum_uncover(u_int16_t *, u_int16_t, u_int8_t);
+static __inline void pf_cksum_cover(u_int16_t *, u_int16_t, u_int8_t);
 
 struct pf_src_tree tree_src_tracking;
 
@@ -1684,27 +1693,247 @@ pf_addr_wrap_neq(struct pf_addr_wrap *aw
}
 }
 
+/* This algorithm computes 'a + b - c' in ones-complement using a trick to
+ * emulate at most one ones-complement subtraction. This thereby limits net
+ * carries/borrows to at most one, eliminating a reduction step and saving one
+ * each of +, >>, & and ~.
+ *
+ * def. x mod y = x - (x//y)*y for integer x,y
+ * def. sum = x mod 2^16
+ * def. accumulator = (x >> 16) mod 2^16
+ *
+ * The trick works as follows: subtracting exactly one u_int16_t from the
+ * u_int32_t x incurs at most one underflow, wrapping its upper 16-bits, the
+ * accumulator, to 2^16 - 1. Adding this to the 16-bit sum preserves the
+ * ones-complement borrow:
+ *
+ *  (sum + accumulator) mod 2^16
+ * =   { assume underflow: accumulator := 2^16 - 1 }
+ *  (sum + 2^16 - 1) mod 2^16
+ * =   { mod }
+ *  (sum - 1) mod 2^16
+ *
+ * Although this breaks for sum = 0, giving 0x, which is ones-complement's
+ * other zero, not -1, that cannot occur: the 16-bit sum cannot be underflown
+ * to zero as that requires subtraction of at least 2^16, which exceeds a
+ * single u_int16_t's range.
+ *
+ * We use the following theorem to derive the implementation:
+ *
+ * th. (x + (y mod z)) mod z  =  (x + y) mod z   (0)
+ * proof.
+ * (x + (y mod z)) mod z
+ *=  { def mod }
+ * (x + y - (y//z)*z) mod z
+ *=  { (a + b*c) mod c = a mod c }
+ * (x + y) mod z   [end of proof]
+ *
+ * ... and thereby obtain:
+ *
+ *  (sum + accumulator) mod 2^16
+ * =   { def. accumulator, def. sum }
+ *  (x mod 2^16 + (x >> 16) mod 2^16) mod 2^16
+ * =   { (0), twice }
+ *  (x + (x >> 16)) mod 2^16
+ * =   { x mod 2^n = x & (2^n - 1) }
+ *  (x + (x >> 16)

refactor PF option parsing loops

2017-01-23 Thread Richard Procter
Hi, 

PF implements six distinct TCP option parsing loops. This patch converts 
these to one inline function in pfvar_priv.h, normalises their semantics, 
and strips ~100 lines. 

I've laid out the existing semantics below. The new loop implements the 
timestamp parser's semantics of "(s-b) (v-3) (t-a) (t-b)". As a result,

- MSS and WSCALE parsers are stricter about candidate option len.
- MSS normalization is more tolerant of malformed option lists.

These changes were immaterial to the live traffic I've examined (thanks to 
Richard Cziva for help in gathering the data).

I'd like test reports before committing. 

comments, oks? 

best, 
Richard. 

* postcondition on identity (return correct option) "validity"

- 'r[]' contains the returned option.

r[0] = TYPE /\

MIN_TYPELEN >= 2 /\ 

MAX_TYPELEN  = r[1]  = MIN_TYPELEN  (v-1)  (fixed-len opt)  or
MAX_TYPELEN >= r[1] >= MIN_TYPELEN  (v-2)  (weaker) or 
   r[1] >= MIN_TYPELEN  (v-3)  (weaker) or 
   r[1] >= 2(v-4)  (weaker) or 
   true (v-5)  (weakest)


* postcondition on (eoh - opt) (ensure buffer safety) "safety"

- 'eoh' points to one past the end of option header
- 'opt' points to returned option

(eoh - opt) >= r[1] (s-a) 
(eoh - opt) >= MIN_TYPELEN  (s-b)


* parsing (tolerance of malformed headers) "tolerance"

r[0] = EOL=>   continue with r[1] := 1   (t-b)
r[1] < 2  =>   continue with r[1] := 2   (t-a)   
r[1] < 2  =>   break (t-d) 
(eoh - opt) < r[1]=>   break (t-c) 


* Analysis of existing parsers
  safety validity 
tolerance
[TS:fixed:10]  pf_norm.c:pf_normalize_tcp_init():  (s-b) (v-3) 
(t-a) (t-b)
[TS:fixed:10]  pf_norm.c:pf_normalize_tcp_stateful():  (s-b) (v-3) 
(t-a) (t-b)

[SACK:var:2 + (8*n)]   pf.c:pf_modulate_sack():  (s-a) (s-b) (v-3) 
(t-a) (t-b) 

[MSS:fixed:4]  pf_norm.c:pf_normalize_mss(): (s-a) (s-b) (v-4)  
   (t-c) (t-d)
   
[MSS:fixed:4]  pf.c:pf_get_mss():  (s-b) (v-5) 
(t-a) (t-b)   
[WSCL:fixed:3] pf.c:pf_get_wscale():   (s-b) (v-5) 
(t-a) (t-b) 


--- 

Index: net/pf_norm.c
===
--- net.orig/pf_norm.c
+++ net/pf_norm.c
@@ -901,8 +901,9 @@ pf_normalize_tcp_init(struct pf_pdesc *p
 {
struct tcphdr   *th = >hdr.tcp;
u_int32_ttsval, tsecr;
-   u_int8_t hdr[60];
-   u_int8_t*opt;
+   int  olen;
+   u_int8_t opts[MAX_TCPOPTLEN], *opt = opts;
+
 
KASSERT(src->scrub == NULL);
 
@@ -935,44 +936,28 @@ pf_normalize_tcp_init(struct pf_pdesc *p
if ((th->th_flags & TH_SYN) == 0)
return (0);
 
-   if (th->th_off > (sizeof(struct tcphdr) >> 2) && src->scrub &&
-   pf_pull_hdr(pd->m, pd->off, hdr, th->th_off << 2, NULL, NULL,
-   pd->af)) {
-   /* Diddle with TCP options */
-   int hlen;
+   if (src->scrub == NULL)
+   return (0);
 
-   opt = hdr + sizeof(struct tcphdr);
-   hlen = (th->th_off << 2) - sizeof(struct tcphdr);
-   while (hlen >= TCPOLEN_TIMESTAMP) {
-   switch (*opt) {
-   case TCPOPT_EOL:/* FALLTHROUGH */
-   case TCPOPT_NOP:
-   opt++;
-   hlen--;
-   break;
-   case TCPOPT_TIMESTAMP:
-   if (opt[1] >= TCPOLEN_TIMESTAMP) {
-   src->scrub->pfss_flags |=
-   PFSS_TIMESTAMP;
-   src->scrub->pfss_ts_mod = arc4random();
-
-   /* note PFSS_PAWS not set yet */
-   memcpy(, [2],
-   sizeof(u_int32_t));
-   memcpy(, [6],
-   sizeof(u_int32_t));
-   src->scrub->pfss_tsval0 = ntohl(tsval);
-   src->scrub->pfss_tsval = ntohl(tsval);
-   src->scrub->pfss_tsecr = ntohl(tsecr);
-   getmicrouptime(>scrub->pfss_last);
-   }
-   /* FALLTHROUGH */
-   default:
-   hlen -= MAX(opt[1], 2);
-   opt += MAX(opt[1], 2);
-   

Re: refactor PF option parsing loops

2017-01-23 Thread Richard Procter

PS Find this patch broken down for easier review here:  

http://203.79.107.124/opts/

On Tue, 24 Jan 2017, Richard Procter wrote:

> Hi,
>
> PF implements six distinct TCP option parsing loops. This patch converts
> these to one inline function in pfvar_priv.h, normalises their semantics,
> and strips ~100 lines.




Re: iwm: fix a timeout race

2017-01-20 Thread Richard Procter
Hi,

On 18/01/2017, Stefan Sperling  wrote:
> I managed to trigger the following uvm fault by continously switching an
> iwm(4) client between two APs on different channels, i.e. a loop that runs:
> ifconfig iwm0 chan X; sleep 10; ifconfig iwm chan Y; sleep 10;
>
>  uvm_fault(0x819614a0, 0x7, 0, 2) -> e
>  kernel: page fault trap, code=0
>  Stopped atsoftclock+0x32: movq %rdx,0x8(%rax)
>
> This diff seems to fix the problem. It cancels mira timeouts from
> interrupt context where a state change is scheduled, instead of
> cancelling mira timeouts from the process context which performs
> the actual state change.

If I understand right, the crash is due to calling timeout_set() via
ieee80211_mira_node_init() while mira timeouts are active.

Now, these timeouts are cancelled on transition from the RUN state
(and cannot be cancelled unconditionally). However the guard to detect
this transition is unreliable -- the root cause -- as it races against
ic->ic_state.

This would imply there is a more general problem with identifying
state transitions within iwm_newstate_task?

That said, I was unable to further simplify your patch, it addresses
the immediate issue, and I've tested it on my iwm(4) without problem.
ok procter@

best,
Richard.

iwm0: hw rev 0x210, fw ver 16.242414.0, address xx:xx:xx:xx:xx:xx

> Also, cancel mira timeouts when the interface is brought down in
> iwm_stop() because this code path does not go through iwm_newstate().
>
> ok?
>
> Note that unless we are in RUN state the timeouts have not been initialized
> so timeout_del() inside mira_cancel_timeouts() would panic if we called it.
> This can't be done in a different way since mira state can only be
> initialized once the node for our AP (ic->ic_bss) is available.
>
> Index: if_iwm.c
> ===
> RCS file: /cvs/src/sys/dev/pci/if_iwm.c,v
> retrieving revision 1.156
> diff -u -p -r1.156 if_iwm.c
> --- if_iwm.c  12 Jan 2017 18:06:57 -  1.156
> +++ if_iwm.c  17 Jan 2017 18:59:20 -
> @@ -5496,10 +5496,8 @@ iwm_newstate_task(void *psc)
>   if (ostate == IEEE80211_S_SCAN && nstate != ostate)
>   iwm_led_blink_stop(sc);
>
> - if (ostate == IEEE80211_S_RUN && nstate != ostate) {
> + if (ostate == IEEE80211_S_RUN && nstate != ostate)
>   iwm_disable_beacon_filter(sc);
> - ieee80211_mira_cancel_timeouts(>in_mn);
> - }
>
>   /* Reset the device if moving out of AUTH, ASSOC, or RUN. */
>   /* XXX Is there a way to switch states without a full reset? */
> @@ -5632,8 +5630,11 @@ iwm_newstate(struct ieee80211com *ic, en
>  {
>   struct ifnet *ifp = IC2IFP(ic);
>   struct iwm_softc *sc = ifp->if_softc;
> + struct iwm_node *in = (void *)ic->ic_bss;
>
>   timeout_del(>sc_calib_to);
> + if (ic->ic_state == IEEE80211_S_RUN)
> + ieee80211_mira_cancel_timeouts(>in_mn);
>
>   sc->ns_nstate = nstate;
>   sc->ns_arg = arg;
> @@ -6116,6 +6117,8 @@ iwm_stop(struct ifnet *ifp, int disable)
>   ifq_clr_oactive(>if_snd);
>
>   in->in_phyctxt = NULL;
> + if (ic->ic_state == IEEE80211_S_RUN)
> + ieee80211_mira_cancel_timeouts(>in_mn);
>
>   task_del(systq, >init_task);
>   task_del(sc->sc_nswq, >newstate_task);
>
>



[PATCH] [0/1] pf refactoring

2016-08-19 Thread Richard Procter
Hi, 

I've reduced the pf refactor (phase two) to two patches, which I'll be 
committing in 24 hours or so unless there are any objections.

I'm confident it won't, but supposing post-commit these have in 
fact blown up, my first suspect would be the afto paths.


This patch removes pf_translate_ap(). 
The 'working' appears at http://203.79.107.124/ in the 012*.diff files.

best, 
Richard. 

Index: net/pf.c
===
--- net.orig/pf.c
+++ net/pf.c
@@ -154,8 +154,6 @@ int  pf_check_tcp_cksum(struct mbuf *,
sa_family_t);
 static __inline voidpf_cksum_fixup(u_int16_t *, u_int16_t, u_int16_t,
u_int8_t);
-voidpf_translate_ap(struct pf_pdesc *, struct pf_addr *,
-   u_int16_t *, struct pf_addr *, u_int16_t);
 voidpf_cksum_fixup_a(u_int16_t *, const struct pf_addr *,
const struct pf_addr *, sa_family_t, u_int8_t);
 int pf_modulate_sack(struct pf_pdesc *,
@@ -1910,16 +1908,6 @@ pf_patch_32(struct pf_pdesc *pd, u_int32
 }
 
 void
-pf_translate_ap(struct pf_pdesc *pd, struct pf_addr *a, u_int16_t *p,
-struct pf_addr *an, u_int16_t pn)
-{
-   if (pd->af == pd->naf)
-   pf_translate_a(pd, a, an);
-   if (p != NULL)
-   pf_patch_16(pd, p, pn);
-}
-
-void
 pf_patch_32_unaligned(struct pf_pdesc *pd, void *f, u_int32_t v, bool hi)
 {
u_int8_t *fb = (u_int8_t*)f;
@@ -4009,12 +3997,16 @@ pf_translate(struct pf_pdesc *pd, struct
case IPPROTO_TCP:
if (afto || PF_ANEQ(saddr, pd->src, pd->af) ||
*pd->sport != sport) {
-   pf_translate_ap(pd, pd->src, pd->sport, saddr, sport);
+   if (pd->af == pd->naf)
+   pf_translate_a(pd, pd->src, saddr);
+   pf_patch_16(pd, pd->sport, sport);
rewrite = 1;
}
if (afto || PF_ANEQ(daddr, pd->dst, pd->af) ||
*pd->dport != dport) {
-   pf_translate_ap(pd, pd->dst, pd->dport, daddr, dport);
+   if (pd->af == pd->naf)
+   pf_translate_a(pd, pd->dst, daddr);
+   pf_patch_16(pd, pd->dport, dport);
rewrite = 1;
}
break;
@@ -4022,12 +4014,16 @@ pf_translate(struct pf_pdesc *pd, struct
case IPPROTO_UDP:
if (afto || PF_ANEQ(saddr, pd->src, pd->af) ||
*pd->sport != sport) {
-   pf_translate_ap(pd, pd->src, pd->sport, saddr, sport);
+   if (pd->af == pd->naf)
+   pf_translate_a(pd, pd->src, saddr);
+   pf_patch_16(pd, pd->sport, sport);
rewrite = 1;
}
if (afto || PF_ANEQ(daddr, pd->dst, pd->af) ||
*pd->dport != dport) {
-   pf_translate_ap(pd, pd->dst, pd->dport, daddr, dport);
+   if (pd->af == pd->naf)
+   pf_translate_a(pd, pd->dst, daddr);
+   pf_patch_16(pd, pd->dport, dport);
rewrite = 1;
}
break;
@@ -4078,11 +4074,11 @@ pf_translate(struct pf_pdesc *pd, struct
rewrite = 1;
} else {
if (PF_ANEQ(saddr, pd->src, pd->af)) {
-   pf_translate_ap(pd, pd->src, NULL, saddr, 0);
+   pf_translate_a(pd, pd->src, saddr);
rewrite = 1;
}
if (PF_ANEQ(daddr, pd->dst, pd->af)) {
-   pf_translate_ap(pd, pd->dst, NULL, daddr, 0);
+   pf_translate_a(pd, pd->dst, daddr);
rewrite = 1;
}
}
@@ -4113,11 +4109,11 @@ pf_translate(struct pf_pdesc *pd, struct
 #ifdef INET6
case AF_INET6:
if (!afto && PF_ANEQ(saddr, pd->src, pd->af)) {
-   pf_translate_ap(pd, pd->src, NULL, saddr, 0);
+   pf_translate_a(pd, pd->src, saddr);
rewrite = 1;
}
if (!afto && PF_ANEQ(daddr, pd->dst, pd->af)) {
-   pf_translate_ap(pd, pd->dst, NULL, daddr, 0);
+   pf_translate_a(pd, pd->dst, daddr);
rewrite = 1;
}
break;
@@ -4738,18 +4734,24 @@ pf_test_state(struct pf_pdesc *pd, struc
 #endif /* INET6 */
 
if (afto || PF_ANEQ(pd->src, 

[PATCH] [1/1] pf refactoring

2016-08-19 Thread Richard Procter

The final patch in the pf series. Will commit when I do the previous one 
in around 24 hours unless there are objections.


- pushes the 'field changed' guards into the 'change field' functions. 
This lets us normalise many of the existing guards and eliminate lots of 
code from pf_translate().

- while here, optimise pf_patch_32() a little. Relatively clean and gains 
25% speedup for a function that can be called four times per TCP segment. 
I don't know if this matters in the big picture.

- simplify pf_match_addr(). Compiler achieves the same optimisation;
I saw no difference in the assembly.


The 'working' for these appears at http://203.79.107.124/ under the 
phase two heading. 

best, 
Richard. 

Index: net/pf.c
===
--- net.orig/pf.c
+++ net/pf.c
@@ -160,15 +160,15 @@ intpf_modulate_sack(struct 
pf_pdesc
struct pf_state_peer *);
 int pf_icmp_mapping(struct pf_pdesc *, u_int8_t, int *,
u_int16_t *, u_int16_t *);
 int pf_change_icmp_af(struct mbuf *, int,
struct pf_pdesc *, struct pf_pdesc *,
struct pf_addr *, struct pf_addr *, sa_family_t,
sa_family_t);
-voidpf_translate_a(struct pf_pdesc *, struct pf_addr *,
+int pf_translate_a(struct pf_pdesc *, struct pf_addr *,
struct pf_addr *);
 voidpf_translate_icmp(struct pf_pdesc *, struct pf_addr *,
u_int16_t *, struct pf_addr *, struct pf_addr *,
u_int16_t);
 int pf_translate_icmp_af(struct pf_pdesc*, int, void *);
 voidpf_send_tcp(const struct pf_rule *, sa_family_t,
const struct pf_addr *, const struct pf_addr *,
@@ -1859,73 +1859,103 @@ pf_cksum_fixup_a(u_int16_t *cksum, const
return;
if (udp && x == 0x)
x = 0x;
 
*cksum = (u_int16_t)(x);
 }
 
-void
+int
 pf_patch_8(struct pf_pdesc *pd, u_int8_t *f, u_int8_t v, bool hi)
 {
-   u_int16_t new = htons(hi ? ( v << 8) :  v);
-   u_int16_t old = htons(hi ? (*f << 8) : *f);
+   int rewrite = 0;
 
-   pf_cksum_fixup(pd->pcksum, old, new, pd->proto);
-   *f = v;
+   if (*f != v) {
+   u_int16_t old = htons(hi ? (*f << 8) : *f);
+   u_int16_t new = htons(hi ? ( v << 8) :  v);
+
+   pf_cksum_fixup(pd->pcksum, old, new, pd->proto);
+   *f = v;
+   rewrite = 1;
+   }
+
+   return (rewrite);
 }
 
 /* pre: *f is 16-bit aligned within its packet */
-void
+int
 pf_patch_16(struct pf_pdesc *pd, u_int16_t *f, u_int16_t v)
 {
-   pf_cksum_fixup(pd->pcksum, *f, v, pd->proto);
-   *f = v;
+   int rewrite = 0;
+
+   if (*f != v) {
+   pf_cksum_fixup(pd->pcksum, *f, v, pd->proto);
+   *f = v;
+   rewrite = 1;
+   }
+
+   return (rewrite);
 }
 
-void
+int
 pf_patch_16_unaligned(struct pf_pdesc *pd, void *f, u_int16_t v, bool hi)
 {
-   u_int8_t *fb = (u_int8_t*)f;
-   u_int8_t *vb = (u_int8_t*)
+   int rewrite = 0;
+   u_int8_t   *fb = (u_int8_t*)f;
+   u_int8_t   *vb = (u_int8_t*)
 
if (hi && ALIGNED_POINTER(f, u_int16_t)) {
-   pf_patch_16(pd, f, v); /* optimise */
-   return;
+   return (pf_patch_16(pd, f, v)); /* optimise */
}
 
-   pf_patch_8(pd, fb++, *vb++, hi);
-   pf_patch_8(pd, fb++, *vb++,!hi);
+   rewrite += pf_patch_8(pd, fb++, *vb++, hi);
+   rewrite += pf_patch_8(pd, fb++, *vb++,!hi);
+
+   return (rewrite);
 }
 
 /* pre: *f is 16-bit aligned within its packet */
-void
+/* pre: pd->proto != IPPROTO_UDP */
+int
 pf_patch_32(struct pf_pdesc *pd, u_int32_t *f, u_int32_t v)
 {
-   u_int16_t *pc = pd->pcksum;
+   int rewrite = 0;
+   u_int16_t  *pc = pd->pcksum;
+   u_int8_tproto = pd->proto;
+
+   /* optimise: inline udp fixup code is unused; let compiler scrub it */
+   if (proto == IPPROTO_UDP)
+   panic("pf_patch_32: udp");
+
+   /* optimise: skip *f != v guard; true for all use-cases */
+   pf_cksum_fixup(pc, *f / (1 << 16), v / (1 << 16), proto);
+   pf_cksum_fixup(pc, *f % (1 << 16), v % (1 << 16), proto);
 
-   pf_cksum_fixup(pc, *f / (1 << 16), v / (1 << 16), pd->proto);
-   pf_cksum_fixup(pc, *f % (1 << 16), v % (1 << 16), pd->proto);
*f = v;
+   rewrite = 1;
+
+   return (rewrite);
 }
 
-void
+int
 pf_patch_32_unaligned(struct pf_pdesc *pd, void *f, u_int32_t v, bool hi)
 {
-   u_int8_t *fb = (u_int8_t*)f;
-   u_int8_t *vb = (u_int8_t*)
+   int rewrite = 0;
+   u_int8_t   *fb = 

Re: [PATCH] [0/1] pf refactoring

2016-08-19 Thread Richard Procter
Hi Mike, 

On Fri, 19 Aug 2016, Mike Belopuhov wrote:

> I've looked through it and couldn't find anything wrong with it.

Thanks. 

> I do however find pacthing of values in pf_translate_icmp_af
> unneccessary since we'll be throwing away the original header
> anyway.

Do you mean e.g. circa line 2602, pf.c:pf_translate_icmp_af() 
(HEAD + complete diff)?

pf_patch_8(pd, >icmp_type, type, PF_HI);
pf_patch_8(pd, >icmp_code, code, PF_LO);
pf_patch_16(pd, >icmp_nextmtu, htons(mtu));
if (ptr >= 0)
pf_patch_32(pd, >icmp_void, htonl(ptr));

My understanding is that the ICMP v4 and v6 headers are so similar that pf 
af-to hacks the one into the other; the code above is filling an ICMPv4 
header with v6 values; the updates are not redundant. It muddled me a bit 
when I was converting it.

> 
> OK mikeb for the diff.

best, 
Richard. 



Re: pf_route pf_pdesc

2016-10-22 Thread Richard Procter

On Fri, 21 Oct 2016, Alexander Bluhm wrote:
> We could put the union pf_headers into a separate header file.
> Henning, I hope you don't mind that I used a current license.template
> with your old pf_headers code.
> 
> Do we want to go this way?

You got me thinking how else this might be done.

Note that union pf_headers is used only to calculate the max size of the 
header pull buffer. Its members are never referenced. 

So it may be enough to #define PF_MAXHDR_SIZE 28 in pfvar.h (== 
sizeof(union pf_headers)), then verify it by a compile-time assert in 
pf.c.

   +ve no new header file to support what amounts to an
   implementation detail.

best, 
Richard. 

 
[a sketch --- compiled but untested beyond checking the CTASSERT fires 
when PF_MAXHDR_SIZE = 20]

Index: net/pf.c
===
--- net.orig/pf.c
+++ net/pf.c
@@ -128,7 +128,7 @@ struct pf_anchor_stackframe {
 
 /*
  * Cannot fold into pf_pdesc directly, unknown storage size outside pf.c.
- * Keep in sync with union pf_headers in pflog_bpfcopy() in if_pflog.c.
+ * Cannot be put into pfvar.h as that is included in too many places.
  */
 union pf_headers {
struct tcphdr   tcp;
@@ -140,7 +140,7 @@ union pf_headers {
struct nd_neighbor_solicit nd_ns;
 #endif /* INET6 */
 };
-
+CTASSERT(PF_MAXHDR_SIZE >= sizeof(union pf_headers));
 
 struct pool pf_src_tree_pl, pf_rule_pl, pf_queue_pl;
 struct pool pf_state_pl, pf_state_key_pl, pf_state_item_pl;
@@ -6613,7 +6613,7 @@ pf_test(sa_family_t af, int fwdir, struc
struct pf_state *s = NULL;
struct pf_ruleset   *ruleset = NULL;
struct pf_pdesc  pd;
-   union pf_headers pdhdrs;
+   u_int8_t pdhdrs[PF_MAXHDR_SIZE];
int  dir = (fwdir == PF_FWD) ? PF_OUT : fwdir;
u_int32_tqid, pqid = 0;
 
Index: net/pfvar.h
===
--- net.orig/pfvar.h
+++ net/pfvar.h
@@ -1156,6 +1156,8 @@ enum pfi_kif_refs {
 #define PFI_IFLAG_SKIP 0x0100  /* skip filtering on interface */
 #define PFI_IFLAG_ANY  0x0200  /* match any non-loopback interface */
 
+#define PF_MAXHDR_SIZE 28  /* max pf pf_pdesc header buffer */
+
 struct pf_pdesc {
struct {
int  done;
Index: net/if_pflog.c
===
--- net.orig/if_pflog.c
+++ net/if_pflog.c
@@ -297,16 +297,7 @@ pflog_bpfcopy(const void *src_arg, void
u_intcount;
u_char  *dst, *mdst;
int  afto, hlen, mlen, off;
-   union pf_headers {
-   struct tcphdr   tcp;
-   struct udphdr   udp;
-   struct icmp icmp;
-#ifdef INET6
-   struct icmp6_hdricmp6;
-   struct mld_hdr  mld;
-   struct nd_neighbor_solicit nd_ns;
-#endif /* INET6 */
-   } pdhdrs;
+   u_int8_t pdhdrs[PF_MAXHDR_SIZE];
 
struct pf_pdesc  pd;
struct pf_addr   osaddr, odaddr;



Re: pf af-to route output

2016-11-19 Thread Richard Procter

On Mon, 14 Nov 2016, Alexander Bluhm wrote:

> Hi,
> 
> The !r->rt case is only used by af-to.  pf_route6() calls ip6_output()
> to do the work while pf_route() has some custom implementation for
> that.  It is simpler to call ip_output() or ip6_output() from
> pf_test() directly.
> 
> ok?

Note, pf_route() calls pf_test() only if (pd->kif->pfik_ifp != ifp).
(I read this as 'pf changed the packet's interface'.) 

Using ip_output() avoids this guard. If it's an optimisation, no problem, 
but that's unclear to me. 

(I suspect it's ok, as af-to is invalid in out-bound rules: so the guard 
is always true and pf_test() is always called, unless the af-to packet is 
being sent out the interface it arrived on. But pf_test() should be called 
for all output packets, so this patch would improve the situation.)

With that proviso, ok procter@ 

best, 
Richard. 




[PATCH] pf: fold pf_headers into pf_pdesc

2016-11-19 Thread Richard Procter
Hi, 

This patch folds pf_headers into pf_pdesc, and eliminates pf_pdesc's 
header pointers. It's mostly mechanical except for strengthening a guard 
in pf_socket_lookup().

I've an OK from bluhm@ but to give others a heads-up I won't commit for 
another 48 hours.

+ve removes the header buffer indirection
+ve shaves a parameter from pf_setup_pdesc; no need to allocate 
a separate header struct on the stack.
+ve lets us scrub a few (void*) casts 
+ve shaves ~200 bytes from pf.o on i386 (adds ~20 on amd64). 

-ve it's relatively big 
-ve? pf_headers is now bzero'ed on init (28 bytes).


As per usual I've broken this into an easier-to-review series 
of patches, which is available at http://203.79.107.124/hdr 

Compiled and tested for basic sanity on i386. 

best, 
Richard. 

-- 

(includes whitespace fixes from bluhm@) 

Index: net/pf.c
===
--- net.orig/pf.c
+++ net/pf.c
@@ -773,25 +773,22 @@ pf_state_key_addr_setup(struct pf_pdesc
 {
struct pf_state_key_cmp *key = arg;
 #ifdef INET6
-   struct nd_neighbor_solicit *nd;
struct pf_addr *target;
 
if (af == AF_INET || pd->proto != IPPROTO_ICMPV6)
goto copy;
 
-   switch (pd->hdr.icmp6->icmp6_type) {
+   switch (pd->hdr.icmp6.icmp6_type) {
case ND_NEIGHBOR_SOLICIT:
if (multi)
return (-1);
-   nd = (void *)pd->hdr.icmp6;
-   target = (struct pf_addr *)>nd_ns_target;
+   target = (struct pf_addr *)>hdr.nd_ns.nd_ns_target;
daddr = target;
break;
case ND_NEIGHBOR_ADVERT:
if (multi)
return (-1);
-   nd = (void *)pd->hdr.icmp6;
-   target = (struct pf_addr *)>nd_ns_target;
+   target = (struct pf_addr *)>hdr.nd_ns.nd_ns_target;
saddr = target;
if (IN6_IS_ADDR_MULTICAST(>dst->v6)) {
key->addr[didx].addr32[0] = 0;
@@ -1978,7 +1975,7 @@ pf_icmp_mapping(struct pf_pdesc *pd, u_i
/* FALLTHROUGH */
case ICMP_ECHOREPLY:
*virtual_type = ICMP_ECHO;
-   *virtual_id = pd->hdr.icmp->icmp_id;
+   *virtual_id = pd->hdr.icmp.icmp_id;
break;
 
case ICMP_TSTAMP:
@@ -1986,7 +1983,7 @@ pf_icmp_mapping(struct pf_pdesc *pd, u_i
/* FALLTHROUGH */
case ICMP_TSTAMPREPLY:
*virtual_type = ICMP_TSTAMP;
-   *virtual_id = pd->hdr.icmp->icmp_id;
+   *virtual_id = pd->hdr.icmp.icmp_id;
break;
 
case ICMP_IREQ:
@@ -1994,7 +1991,7 @@ pf_icmp_mapping(struct pf_pdesc *pd, u_i
/* FALLTHROUGH */
case ICMP_IREQREPLY:
*virtual_type = ICMP_IREQ;
-   *virtual_id = pd->hdr.icmp->icmp_id;
+   *virtual_id = pd->hdr.icmp.icmp_id;
break;
 
case ICMP_MASKREQ:
@@ -2002,7 +1999,7 @@ pf_icmp_mapping(struct pf_pdesc *pd, u_i
/* FALLTHROUGH */
case ICMP_MASKREPLY:
*virtual_type = ICMP_MASKREQ;
-   *virtual_id = pd->hdr.icmp->icmp_id;
+   *virtual_id = pd->hdr.icmp.icmp_id;
break;
 
case ICMP_IPV6_WHEREAREYOU:
@@ -2060,14 +2057,14 @@ pf_icmp_mapping(struct pf_pdesc *pd, u_i
/* FALLTHROUGH */
case ICMP6_ECHO_REPLY:
*virtual_type = ICMP6_ECHO_REQUEST;
-   *virtual_id = pd->hdr.icmp6->icmp6_id;
+   *virtual_id = pd->hdr.icmp6.icmp6_id;
break;
 
case MLD_LISTENER_QUERY:
*icmp_dir = PF_IN;
/* FALLTHROUGH */
case MLD_LISTENER_REPORT: {
-   struct mld_hdr *mld = (void *)pd->hdr.icmp6;
+   struct mld_hdr *mld = >hdr.mld;
u_int32_t h;
 
*virtual_type = MLD_LISTENER_QUERY;
@@ -2104,7 +2101,7 @@ pf_icmp_mapping(struct pf_pdesc *pd, u_i
*icmp_dir = PF_IN;
/* FALLTHROUGH */
case ND_NEIGHBOR_ADVERT: {
-   struct nd_neighbor_solicit *nd = (void *)pd->hdr.icmp6;
+   struct nd_neighbor_solicit *nd = >hdr.nd_ns;
u_int32_t h;
 
*virtual_type = ND_NEIGHBOR_SOLICIT;
@@ -2307,7 +2304,7 @@ pf_translate_af(struct pf_pdesc *pd)
 
/* flush pd->pcksum */
if (copyback)
-   m_copyback(pd->m, pd->off, pd->hdrlen, pd->hdr.any, M_NOWAIT);
+

Re: allow iwm_stop to sleep

2017-08-26 Thread Richard Procter
Hi Stefan, 

On 14/08/2017, at 7:07 PM, Stefan Sperling wrote:

> This diff makes iwm_stop() always run in a process context.
> I want iwm_stop() to be able to sleep so that it can wait for asynchronous
> driver tasks, and perhaps even wait for firmware commands, in the future.
> 
> If the interrupt handler detects a fatal condition, instead of calling
> iwm_stop() directly, defer to the init task. The init task looks at flags
> to understand what happened and restarts or stops the device as appropriate.
> 
> I found that toggling the RF kill switch can trigger a fatal firmware error.
> Hence I am letting the interrupt handler check RF kill before checking for
> fatal firmware error. Provides better error reporting when the kill switch
> is used.
> 
> During suspend, bring the device down during the QUIESCE stage which
> is allowed to sleep.
> 
> dhclient/down/scan in a loop still works (as far as it always has, with
> various errors reported in dmesg). Suspend/resume still works.

I've tested mine with a 'while true; do ifconfig down; ifconfig up; done' 
concurrent with a continuous kernel build. An hour later everything still 
worked.
And I did see some 

iwm0: fatal firmware error
iwm0: failed to update MAC
etc

My computestick lacks an RF kill button to test, though.

One simplification below.

ok procter@

best, 
Richard. 

iwm0 at pci1 dev 0 function 0 "Intel Dual Band Wireless AC 7265" rev 0x69, msi
iwm0: hw rev 0x210, fw ver 16.242414.0, address xx:xx:xx:xx:xx:xx

> ok?
> 
> Index: if_iwm.c
> ===
> RCS file: /cvs/src/sys/dev/pci/if_iwm.c,v
> retrieving revision 1.211
> diff -u -p -r1.211 if_iwm.c
> --- if_iwm.c  13 Aug 2017 18:08:03 -  1.211
> +++ if_iwm.c  14 Aug 2017 06:44:59 -
> @@ -6517,6 +6517,7 @@ iwm_stop(struct ifnet *ifp)
>   sc->sc_flags &= ~IWM_FLAG_BINDING_ACTIVE;
>   sc->sc_flags &= ~IWM_FLAG_STA_ACTIVE;
>   sc->sc_flags &= ~IWM_FLAG_TE_ACTIVE;
> + sc->sc_flags &= ~IWM_FLAG_HW_ERR;
> 
>   sc->sc_newstate(ic, IEEE80211_S_INIT, -1);
> 
> @@ -7169,7 +7170,6 @@ int
> iwm_intr(void *arg)
> {
>   struct iwm_softc *sc = arg;
> - struct ifnet *ifp = IC2IFP(>sc_ic);
>   int handled = 0;
>   int r1, r2, rv = 0;
>   int isperiodic = 0;
> @@ -7218,6 +7218,15 @@ iwm_intr(void *arg)
>   /* ignored */
>   handled |= (r1 & (IWM_CSR_INT_BIT_ALIVE /*| IWM_CSR_INT_BIT_SCD*/));
> 
> + if (r1 & IWM_CSR_INT_BIT_RF_KILL) {
> + handled |= IWM_CSR_INT_BIT_RF_KILL;
> + if (iwm_check_rfkill(sc)) {
> + task_add(systq, >init_task);
> + rv = 1;
> + goto out;
> + }
> + }
> +
>   if (r1 & IWM_CSR_INT_BIT_SW_ERR) {
> #ifdef IWM_DEBUG
>   int i;
> @@ -7238,7 +7247,6 @@ iwm_intr(void *arg)
> #endif
> 
>   printf("%s: fatal firmware error\n", DEVNAME(sc));
> - iwm_stop(ifp);
>   task_add(systq, >init_task);
>   rv = 1;
>   goto out;
> @@ -7248,7 +7256,8 @@ iwm_intr(void *arg)
>   if (r1 & IWM_CSR_INT_BIT_HW_ERR) {
>   handled |= IWM_CSR_INT_BIT_HW_ERR;
>   printf("%s: hardware error, stopping device \n", DEVNAME(sc));
> - iwm_stop(ifp);
> + sc->sc_flags |= IWM_FLAG_HW_ERR;
> + task_add(systq, >init_task);
>   rv = 1;
>   goto out;
>   }
> @@ -7262,13 +7271,6 @@ iwm_intr(void *arg)
>   wakeup(>sc_fw);
>   }
> 
> - if (r1 & IWM_CSR_INT_BIT_RF_KILL) {
> - handled |= IWM_CSR_INT_BIT_RF_KILL;
> - if (iwm_check_rfkill(sc) && (ifp->if_flags & IFF_UP)) {
> - iwm_stop(ifp);
> - }
> - }
> -
>   if (r1 & IWM_CSR_INT_BIT_RX_PERIODIC) {
>   handled |= IWM_CSR_INT_BIT_RX_PERIODIC;
>   IWM_WRITE(sc, IWM_CSR_INT, IWM_CSR_INT_BIT_RX_PERIODIC);
> @@ -7739,23 +7741,27 @@ iwm_init_task(void *arg1)
> {
>   struct iwm_softc *sc = arg1;
>   struct ifnet *ifp = >sc_ic.ic_if;
> - int s;
> + int s = splnet();
>   int generation = sc->sc_generation;
> + int fatal = (sc->sc_flags & (IWM_FLAG_HW_ERR | IWM_FLAG_RFKILL));

(I presume that sleeping in rw_enter_write() drops splnet.)

>   rw_enter_write(>ioctl_rwl);
>   if (generation != sc->sc_generation) {
>   rw_exit(>ioctl_rwl);
> + splx(s);
>   return;
>   }
> - s = splnet();
> 
>   if (ifp->if_flags & IFF_RUNNING)
>   iwm_stop(ifp);
> - if ((ifp->if_flags & (IFF_UP | IFF_RUNNING)) == IFF_UP)
> + else if (sc->sc_flags & IWM_FLAG_HW_ERR)
> + sc->sc_flags &= ~IWM_FLAG_HW_ERR;

iwm_stop() clears IWM_FLAG_HW_ERR unconditionally, too, so at this point
IWM_FLAG_HW_ERR is clear. Suggest the simpler and equivalent
 
if (ifp->if_flags & IFF_RUNNING)
iwm_stop(ifp);

+   

Re: ping.c minor bug discrepancy between reported size of icmp packet

2018-06-09 Thread Richard Procter
Hi, 

On 9/06/2018, at 7:52 AM, Tom Smyth wrote:

> Hello I see a small discrepancy between the measurement
> of sent and received packets as displayed by ping command
> 
> on the wire the sent and received packets are the same size
> I had a brief go
> 
> foo# ping 5.134.88.1
> PING 5.134.88.1 (5.134.88.1): 56 data bytes
> 64 bytes from 5.134.88.1: icmp_seq=0 ttl=53 time=91.719 ms
> 
> it would appear that one measurement is the ICMP Payload only
> and the other is the ICMP payload + ICMP header  (8 byte difference)

Note that these values do not necessarily differ by 8 as one 
may receive something other than an ECHOREPLY in return e.g.

$ ping -s 40 -v 192.168.1.99
PING 192.168.1.99 (192.168.1.99): 40 data bytes
36 bytes from 192.168.5.1: Destination Host Unreachable

To my mind the two measurements are distinct, distinguished, 
and useful in a diagnostic tool.

As an aside, I have long hankered for ping(1) to advise me on how 
to hit or exceed MTU but it is unable (and ought to be unable) 
to do so, as that would involve multiple layer violations. 
For the same reason, it is unable to indicate total bytes sent. 
See the second paragraph of icmp(4) for starters.

best, 
Richard. 




> 
> 
> foo# grep -n " data bytes" /root/ping.c
> 760:printf("%s): %d data bytes\n", pr_addr(dst, dst->sa_len), datalen);
> foo# grep -n " bytes from" /root/ping.c
> 1248:   printf("%d bytes from %s: icmp_seq=%u", cc,
> 1292:   printf("%d bytes from %s: ", cc, pr_addr(from, fromlen));
> 
> looking at the source code it looks like the size = %d but %d is presenting
> different values
> I didnt see where %d was being changed between line 760 and line 1248
> It has been a while since I looked at C programming in anger and im a bit
> rusty...
> any pointers on where i should be looking so that I can submit a patch
> 
> -- 
> Kindest regards,
> Tom Smyth
> 



Re: ksh "clear-screen" editing command

2018-06-18 Thread Richard Procter


On 6/06/2018, at 10:20 AM, Alexander Hall wrote:

> This adds a "clear-screen" editing command to the emacs editing mode.
> This is the same name as bash and zsh uses, and then I stopped looking.
> 
> Thoughts? OK?

It's unclear to me what need is being being addressed here --- are you
wanting a way to quickly hide sensitive info typed by mistake at a
commandline?

^Cclear^M is available, of course, or ^A^K, or the "bind -m '^L'=^Uclear'^J^Y'"
that benno@ mentioned. On the console, one can switch VT. I suppose some
window managers offer short-cuts to "lock screen", or can switch virtual
desktops, or minimise the window.

Yes, I could see how the patch might be warranted if this occured
frequently, but how often do you find yourself needing it? Often enough
to be annoyed by the side-effects you mentioned of benno's key binding?

It seems to me this patch would make ksh into a screen-orientated shell
for the sake of reimplementing a rarely-used shortcut.

best, 
Richard.  

> 
> 
> /Alexander
> 
> 
> Index: emacs.c
> ===
> RCS file: /cvs/src/bin/ksh/emacs.c,v
> retrieving revision 1.84
> diff -u -p -r1.84 emacs.c
> --- emacs.c   16 Jan 2018 17:17:18 -  1.84
> +++ emacs.c   5 Jun 2018 22:03:49 -
> @@ -146,6 +146,7 @@ static intisu8cont(unsigned char);
> /* proto's for keybindings */
> static intx_abort(int);
> static intx_beg_hist(int);
> +static int   x_clear_screen(int);
> static intx_comp_comm(int);
> static intx_comp_file(int);
> static intx_complete(int);
> @@ -202,6 +203,7 @@ static intx_debug_info(int);
> static const struct x_ftab x_ftab[] = {
>   { x_abort,  "abort",0 },
>   { x_beg_hist,   "beginning-of-history", 0 },
> + { x_clear_screen,   "clear-screen", 0 },
>   { x_comp_comm,  "complete-command", 0 },
>   { x_comp_file,  "complete-file",0 },
>   { x_complete,   "complete", 0 },
> @@ -1004,12 +1006,19 @@ x_draw_line(int c)
> {
>   x_redraw(-1);
>   return KSTD;
> +}
> 
> +static int
> +x_clear_screen(int c)
> +{
> + x_redraw(-2);
> + return KSTD;
> }
> 
> -/* Redraw (part of) the line.  If limit is < 0, the everything is redrawn
> - * on a NEW line, otherwise limit is the screen column up to which needs
> - * redrawing.
> +/* Redraw (part of) the line.
> + * A non-negative limit is the screen column up to which needs
> + * redrawing. A limit of -1 redraws on a new line, while a limit
> + * of -2 (attempts to) clear the screen.
>  */
> static void
> x_redraw(int limit)
> @@ -1018,9 +1027,15 @@ x_redraw(int limit)
>   char*cp;
> 
>   x_adj_ok = 0;
> - if (limit == -1)
> + if (limit == -2) {
> + char *clearstr = str_val(global("CLEARSTR"));
> + if (clearstr == null)
> + clearstr = "\033[H\033[J";
> + x_e_puts(clearstr);
> + }
> + else if (limit == -1)
>   x_e_putc('\n');
> - else
> + else if (limit >= 0)
>   x_e_putc('\r');
>   x_flush();
>   if (xbp == xbuf) {
> Index: ksh.1
> ===
> RCS file: /cvs/src/bin/ksh/ksh.1,v
> retrieving revision 1.200
> diff -u -p -r1.200 ksh.1
> --- ksh.1 30 May 2018 21:20:52 -  1.200
> +++ ksh.1 5 Jun 2018 22:03:49 -
> @@ -1345,6 +1345,8 @@ Also, the
> .Ic cd
> built-in command will display the resulting directory when a match is found
> in any search path other than the empty path.
> +.It Ev CLEARSTR
> +If set, overrides the default escape sequence to clear the screen.
> .It Ev COLUMNS
> Set to the number of columns on the terminal or window.
> Currently set to the
> @@ -4690,6 +4692,12 @@ Moves the cursor to the beginning of the
> Uppercase the first character in the next
> .Ar n
> words, leaving the cursor past the end of the last word.
> +.It clear-screen:
> +Clears the screen and redraws the prompt and current input line.
> +If the
> +.Ev CLEARSTR
> +parameter is set, it is used to clear the screen.
> +Otherwise, a default escape sequence (^[[H^[2J) is used.
> .It comment: ^[#
> If the current line does not begin with a comment character, one is added at
> the beginning of the line and the line is entered (as if return had been
> 



Re: ipsec ah_massage_headers cleanup

2018-02-05 Thread Richard Procter


On Mon, 5 Feb 2018, Alexander Bluhm wrote:

> Hi,
> 
> While reading ah_massage_headers() for the erratas, I found some
> issues which are ugly, but not security relevant.
> 
> - Declare global array ipseczeroes with zeroes constant.
> - The proto parameter contains the address family, so call it af.
> - Remove an unused if block, just keep the else.
> - If m_copyback(M_NOWAIT) fails, return with error instead of working
>   with an inconsistent mbuf.
> - ip6_nxt is u_int8_t, no need to clear the high bits.
> - The offset and next protocol are advanced for all extension
>   headers, move it after the switch.
> - ah_massage_headers() returns an errno, call the variable error.
> 
> ok?

one comment inline 

ok procter@

> 
> bluhm
> 
> Index: netinet/ip_ah.c
> ===
> RCS file: /data/mirror/openbsd/cvs/src/sys/netinet/ip_ah.c,v
> retrieving revision 1.134
> diff -u -p -r1.134 ip_ah.c
> --- netinet/ip_ah.c   1 Feb 2018 21:06:31 -   1.134
> +++ netinet/ip_ah.c   5 Feb 2018 16:05:04 -
> @@ -80,7 +80,7 @@ voidah_output_cb(struct cryptop *);
>  void ah_input_cb(struct cryptop *);
>  int  ah_massage_headers(struct mbuf **, int, int, int, int);
>  
> -unsigned char ipseczeroes[IPSEC_ZEROES_SIZE]; /* zeroes! */
> +const unsigned char ipseczeroes[IPSEC_ZEROES_SIZE]; /* zeroes! */
>  
>  
>  /*
> @@ -190,21 +190,19 @@ ah_zeroize(struct tdb *tdbp)
>   * Massage IPv4/IPv6 headers for AH processing.
>   */
>  int
> -ah_massage_headers(struct mbuf **m0, int proto, int skip, int alg, int out)
> +ah_massage_headers(struct mbuf **m0, int af, int skip, int alg, int out)
>  {
>   struct mbuf *m = *m0;
>   unsigned char *ptr;
> - int off, count;
> -
> + int error, off, count;
>   struct ip *ip;
> -
>  #ifdef INET6
>   struct ip6_ext *ip6e;
>   struct ip6_hdr ip6;
>   int ad, alloc, nxt, noff;
>  #endif /* INET6 */
>  
> - switch (proto) {
> + switch (af) {
>   case AF_INET:
>   /*
>* This is the least painful way of dealing with IPv4 header
> @@ -229,10 +227,8 @@ ah_massage_headers(struct mbuf **m0, int
>  
>   /* IPv4 option processing */
>   for (off = sizeof(struct ip); off < skip;) {
> - if (ptr[off] == IPOPT_EOL || ptr[off] == IPOPT_NOP ||
> - off + 1 < skip)
> - ;
> - else {
> + if (ptr[off] != IPOPT_EOL && ptr[off] != IPOPT_NOP &&
> + off + 1 >= skip) {
>   DPRINTF(("%s: illegal IPv4 option length for"
>   " option %d\n", __func__, ptr[off]));
>  
> @@ -355,7 +351,14 @@ ah_massage_headers(struct mbuf **m0, int
>   ip6.ip6_dst.s6_addr16[1] = 0;
>  
>   /* Done with IPv6 header. */
> - m_copyback(m, 0, sizeof(struct ip6_hdr), , M_NOWAIT);
> + error = m_copyback(m, 0, sizeof(struct ip6_hdr), ,
> + M_NOWAIT);
> + if (error) {
> + DPRINTF(("%s: m_copyback no memory", __func__));
> + ahstat_inc(ahs_hdrops);
> + m_freem(m);
> + return error;
> + }
>  
>   /* Let's deal with the remaining headers (if any). */
>   if (skip - sizeof(struct ip6_hdr) > 0) {
> @@ -386,7 +389,7 @@ ah_massage_headers(struct mbuf **m0, int
>   } else
>   break;
>  
> - nxt = ip6.ip6_nxt & 0xff; /* Next header type. */
> + nxt = ip6.ip6_nxt;  /* Next header type. */
>  
>   for (off = 0; off < skip - sizeof(struct ip6_hdr);) {
>   if (off + sizeof(struct ip6_ext) >
> @@ -428,10 +431,6 @@ ah_massage_headers(struct mbuf **m0, int
>  
>   if (count != noff)
>   goto error6;
> -
> - /* Advance. */
> - off += ((ip6e->ip6e_len + 1) << 3);
> - nxt = ip6e->ip6e_nxt;
>   break;
>  
>   case IPPROTO_ROUTING:
> @@ -471,15 +470,17 @@ ah_massage_headers(struct mbuf **m0, int
>   (caddr_t));
>   addr[0] = ip6.ip6_dst;
>   ip6.ip6_dst = finaldst;
> - m_copyback(m, 0, sizeof(ip6), ,
> - M_NOWAIT);
> -
> + error = m_copyback(m, 0, sizeof(ip6),
> + , M_NOWAIT);
> + if (error) {
> + if (alloc)
> + free(ptr, M_XDATA, 0);
> +  

Re: pfctl: use mnemonic macros, terminate string with null char

2019-01-19 Thread Richard Procter
Hi, 

Comments below,

> On 19/01/2019, at 2:32 PM, Klemens Nanni  wrote:
> 
> A few assorted nits for consistency and proper format, no object change.
> 
> OK?
> 
> Index: pfctl.c
> ===
> RCS file: /cvs/src/sbin/pfctl/pfctl.c,v
> retrieving revision 1.365
> diff -u -p -r1.365 pfctl.c
> --- pfctl.c   11 Jan 2019 03:09:24 -  1.365
> +++ pfctl.c   19 Jan 2019 01:29:20 -
> @@ -1485,7 +1485,6 @@ pfctl_load_ruleset(struct pfctl *pf, cha
>   }
>   } else if (pf->opts & PF_OPT_VERBOSE)
>   printf("\n");
> -
>   }
> 
>   if (pf->optimize)
> @@ -1851,7 +1850,6 @@ pfctl_set_limit(struct pfctl *pf, const 
> {
>   int i;
> 
> -
>   for (i = 0; pf_limits[i].name; i++) {
>   if (strcasecmp(opt, pf_limits[i].name) == 0) {
>   pf->limit[pf_limits[i].index] = limit;
> @@ -2217,7 +2215,7 @@ pfctl_show_anchors(int dev, int opts, ch
>   err(1, "DIOCGETRULESET");
>   if (!strcmp(pr.name, PF_RESERVED_ANCHOR))
>   continue;
> - sub[0] = 0;
> + sub[0] = '\0';
>   if (pr.path[0]) {
>   strlcat(sub, pr.path, sizeof(sub));
>   strlcat(sub, "/", sizeof(sub));
> @@ -2235,6 +2233,7 @@ const char *
> pfctl_lookup_option(char *cmd, const char **list)
> {
>   const char *item = NULL;
> +
>   if (cmd != NULL && *cmd)
>   for (; *list; list++)
>   if (!strncmp(cmd, *list, strlen(cmd))) {
> @@ -2580,15 +2579,15 @@ main(int argc, char *argv[])
>   opts |= PF_OPT_SHOWALL;
>   pfctl_load_fingerprints(dev, opts);
> 
> - pfctl_show_rules(dev, path, opts, 0, anchorname,
> - 0, 0, -1);
> + pfctl_show_rules(dev, path, opts, PFCTL_SHOW_RULES,
> + anchorname, 0, 0, -1);
>   pfctl_show_queues(dev, ifaceopt, opts,
>   opts & PF_OPT_VERBOSE2);
>   pfctl_show_states(dev, ifaceopt, opts, -1);
>   pfctl_show_src_nodes(dev, opts);
>   pfctl_show_status(dev, opts);
> - pfctl_show_rules(dev, path, opts, 1, anchorname,
> - 0, 0, -1);
> + pfctl_show_rules(dev, path, opts, PFCTL_SHOW_LABELS,
> + anchorname, 0, 0, -1);
>   pfctl_show_timeouts(dev, opts);
>   pfctl_show_limits(dev, opts);
>   pfctl_show_tables(anchorname, opts);
> @@ -2671,7 +2670,7 @@ main(int argc, char *argv[])
>   if (optiopt != NULL) {
>   switch (*optiopt) {
>   case 'n':
> - optimize = 0;
> + optimize = PF_OPTIMIZE_NONE;
>   break;
>   case 'b':
>   optimize |= PF_OPTIMIZE_BASIC;
> Index: pfctl_parser.h
> ===
> RCS file: /cvs/src/sbin/pfctl/pfctl_parser.h,v
> retrieving revision 1.112
> diff -u -p -r1.112 pfctl_parser.h
> --- pfctl_parser.h6 Sep 2018 15:07:34 -   1.112
> +++ pfctl_parser.h19 Jan 2019 01:13:13 -
> @@ -57,6 +57,7 @@
> #define PF_NAT_PROXY_PORT_LOW 50001
> #define PF_NAT_PROXY_PORT_HIGH65535
> 
> +#define PF_OPTIMIZE_NONE 0x

these PF_OPTIMIZE_* are bit-field definitions,
see e.g. pfctl_optimize.c:299. 

But PF_OPTIMIZE_NONE is not, as pf->optimize & PF_OPTIMIZE_NONE 
is never true, and pf->optimize |= PF_OPTIMIZE_NONE has no effect. 

so I would leave this as optimize = 0; and drop PF_OPTIMIZE_NONE.

otherwise, ok procter

best, 
Richard. 


> #define PF_OPTIMIZE_BASIC 0x0001
> #define PF_OPTIMIZE_PROFILE   0x0002
> 
> 



Re: iwm(4) WPA2 crypto hardware offload

2019-08-04 Thread Richard Procter
Hi Stefan, 

> On 2/08/2019, at 11:13 PM, Stefan Sperling  wrote:
> 
> This diff enables HW offload for WPA2 CCMP (AES) encrypted unicast
> frames in iwm(4). This is in preparation for Tx aggregation support.

Lightly tested on:

iwm0 at pci1 dev 0 function 0 "Intel Dual Band Wireless AC 7265" rev 0x69, msi
iwm0: hw rev 0x210, fw ver 16.242414.0, address xx:xx:xx:xx:xx:xx

pkg_add -u; download a few big files; ifconfig down; ifconfig up; 
change network.

comments inline. supposing the async set keys command is a non-issue then,
for what it’s worth from this 802.11 novice, ok procter@

thanks! 
Richard. 

> WEP and WPA1/TKIP ciphers are still handled in software, which mirrors
> what the older iwn(4) driver is doing. We don't enable 11n at all with
> those ciphers (see ieee80211_ht_negotiate()), and we won't aggregate
> non-encrypted frames (see ieee80211_can_use_ampdu()).
> 
> Based on an initial diff by procter@ and some code from iwn(4).
> 
> Tested on 7260, 7265, 8260, and 8265 in bsd.rd upgrade and with pkg_add -u.
> 
> ok?
> 
> diff refs/heads/master refs/heads/iwm-hwcrypt
> blob - 7add1e9e682ef5e22169ec1e89a182cda1af7e2a
> blob + 839c0a0f8b3a62115ba6d5e15adfa63158475c86
> --- sys/dev/pci/if_iwm.c
> +++ sys/dev/pci/if_iwm.c
> @@ -367,6 +367,8 @@ int   iwm_get_signal_strength(struct iwm_softc *, 
> struct
> void  iwm_rx_rx_phy_cmd(struct iwm_softc *, struct iwm_rx_packet *,
>   struct iwm_rx_data *);
> int   iwm_get_noise(const struct iwm_statistics_rx_non_phy *);
> +int  iwm_ccmp_decap(struct iwm_softc *, struct mbuf *,
> + struct ieee80211_node *);
> void  iwm_rx_rx_mpdu(struct iwm_softc *, struct iwm_rx_packet *,
>   struct iwm_rx_data *);
> void  iwm_enable_ht_cck_fallback(struct iwm_softc *, struct iwm_node *);
> @@ -448,6 +450,10 @@ int  iwm_disassoc(struct iwm_softc *);
> int   iwm_run(struct iwm_softc *);
> int   iwm_run_stop(struct iwm_softc *);
> struct ieee80211_node *iwm_node_alloc(struct ieee80211com *);
> +int  iwm_set_key(struct ieee80211com *, struct ieee80211_node *,
> + struct ieee80211_key *);
> +void iwm_delete_key(struct ieee80211com *,
> + struct ieee80211_node *, struct ieee80211_key *);
> void  iwm_calib_timeout(void *);
> int   iwm_media_change(struct ifnet *);
> void  iwm_newstate_task(void *);
> @@ -3429,11 +3435,91 @@ iwm_get_noise(const struct iwm_statistics_rx_non_phy *
>   return (nbant == 0) ? -127 : (total / nbant) - 107;
> }
> 
> +int
> +iwm_ccmp_decap(struct iwm_softc *sc, struct mbuf *m, struct ieee80211_node 
> *ni)
> +{
> + struct ieee80211com *ic = >sc_ic;
> + struct ieee80211_key *k = >ni_pairwise_key;
> + struct ieee80211_frame *wh;
> + struct ieee80211_rx_ba *ba;
> + uint64_t pn, *prsc;
> + uint8_t *ivp;
> + uint8_t tid;
> + int hdrlen, hasqos;
> +
> + wh = mtod(m, struct ieee80211_frame *);
> + hdrlen = ieee80211_get_hdrlen(wh);
> + ivp = (uint8_t *)wh + hdrlen;
> +
> + /* Check that ExtIV bit is be set. */
> + if (!(ivp[3] & IEEE80211_WEP_EXTIV)) {
> + DPRINTF(("CCMP decap ExtIV not set\n"));
> + return 1;
> + }
> + hasqos = ieee80211_has_qos(wh);
> + tid = hasqos ? ieee80211_get_qos(wh) & IEEE80211_QOS_TID : 0;
> + ba = hasqos ? >ni_rx_ba[tid] : NULL;
> + prsc = >k_rsc[tid];
> +
> + /* Extract the 48-bit PN from the CCMP header. */
> + pn = (uint64_t)ivp[0]   |
> +  (uint64_t)ivp[1] <<  8 |
> +  (uint64_t)ivp[4] << 16 |
> +  (uint64_t)ivp[5] << 24 |
> +  (uint64_t)ivp[6] << 32 |
> +  (uint64_t)ivp[7] << 40;
> + if (pn <= *prsc) {
> + if (hasqos && ba->ba_state == IEEE80211_BA_AGREED) {
> + /*
> +  * This is an A-MPDU subframe.
> +  * Such frames may be received out of order due to
> +  * legitimate retransmissions of failed subframes
> +  * in previous A-MPDUs. Duplicates will be handled
> +  * in ieee80211_input() as part of A-MPDU reordering.
> +  */
> + } else if (ieee80211_has_seq(wh)) {
> + /*
> +  * Not necessarily a replayed frame since we did not
> +  * check the sequence number of the 802.11 header yet.
> +  */
> + int nrxseq, orxseq;
> +
> + nrxseq = letoh16(*(u_int16_t *)wh->i_seq) >>
> + IEEE80211_SEQ_SEQ_SHIFT;
> + if (hasqos)
> + orxseq = ni->ni_qos_rxseqs[tid];
> + else
> + orxseq = ni->ni_rxseq;
> + if (nrxseq < orxseq) {
> + DPRINTF(("CCMP replayed (n=%d < o=%d)\n",
> + nrxseq, orxseq));
> + ic->ic_stats.is_ccmp_replays++;
> +

Re: iwm: two small Rx code path fixes

2019-08-29 Thread Richard Procter
Hi Stefan,

On 28/08/2019, Stefan Sperling  wrote:
> 1) Fix max frame length check to account for the firmware's Rx result
>header in the buffer, which contains two uint16_t fields.
>Frame data begins after this header.
>
> 2) Do not write to mbuf length fields before the mbuf has been removed
>from the Rx ring.
>Based on dragonfly commit 96eaecf93d9f731459a0df8efc72cfad034320bd
>by Imre Vadasz.
>
> Tested on 8260; ping and tcpbench still work.

Tested pings, bulk downloads, etc, with no ill effects seen.

ok procter@

iwm0 at pci1 dev 0 function 0 "Intel Dual Band Wireless AC 7265" rev 0x69, msi
iwm0: hw rev 0x210, fw ver 16.242414.0, address xx:xx:xx:xx:xx:xx

best,
Richard.

>
> ok?
>
> diff refs/heads/master refs/heads/iwm-mbuf-hacks
> blob - 038e6a63dfff113b525bb1e9a30c935996535569
> blob + b586f473be83a7a68e251700242ba19baa60d83c
> --- sys/dev/pci/if_iwm.c
> +++ sys/dev/pci/if_iwm.c
> @@ -3540,46 +3540,45 @@ iwm_rx_rx_mpdu(struct iwm_softc *sc, struct
> iwm_rx_pac
>   rx_res = (struct iwm_rx_mpdu_res_start *)pkt->data;
>   wh = (struct ieee80211_frame *)(pkt->data + sizeof(*rx_res));
>   len = le16toh(rx_res->byte_count);
>   if (len < IEEE80211_MIN_LEN) {
>   ic->ic_stats.is_rx_tooshort++;
>   IC2IFP(ic)->if_ierrors++;
>   return;
>   }
> - if (len > IWM_RBUF_SIZE) {
> + if (len > IWM_RBUF_SIZE - sizeof(*rx_res)) {
>   IC2IFP(ic)->if_ierrors++;
>   return;
>   }
>   rx_pkt_status = le32toh(*(uint32_t *)(pkt->data +
>   sizeof(*rx_res) + len));
>
> - m = data->m;
> - m->m_data = pkt->data + sizeof(*rx_res);
> - m->m_pkthdr.len = m->m_len = len;
> -
>   if (__predict_false(phy_info->cfg_phy_cnt > 20))
>   return;
>
>   if (!(rx_pkt_status & IWM_RX_MPDU_RES_STATUS_CRC_OK) ||
>   !(rx_pkt_status & IWM_RX_MPDU_RES_STATUS_OVERRUN_OK))
>   return; /* drop */
>
> + m = data->m;
> + if (iwm_rx_addbuf(sc, IWM_RBUF_SIZE, sc->rxq.cur) != 0)
> + return;
> + m->m_data = pkt->data + sizeof(*rx_res);
> + m->m_pkthdr.len = m->m_len = len;
> +
>   device_timestamp = le32toh(phy_info->system_timestamp);
>
>   if (sc->sc_capaflags & IWM_UCODE_TLV_FLAGS_RX_ENERGY_API) {
>   rssi = iwm_get_signal_strength(sc, phy_info);
>   } else {
>   rssi = iwm_calc_rssi(sc, phy_info);
>   }
>   rssi = (0 - IWM_MIN_DBM) + rssi;/* normalize */
>   rssi = MIN(rssi, ic->ic_max_rssi);  /* clip to max. 100% */
> -
> - if (iwm_rx_addbuf(sc, IWM_RBUF_SIZE, sc->rxq.cur) != 0)
> - return;
>
>   chanidx = letoh32(phy_info->channel);
>   if (chanidx < 0 || chanidx >= nitems(ic->ic_channels))  
>   chanidx = ieee80211_chan2ieee(ic, ic->ic_ibss_chan);
>
>   ni = ieee80211_find_rxnode(ic, wh);
>   if (ni == ic->ic_bss) {
>   /*
>
>



Re: iwm(4) WPA2 crypto hardware offload

2019-08-05 Thread Richard Procter


> On 4/08/2019, at 11:21 PM, Stefan Sperling  wrote:
> On Sun, Aug 04, 2019 at 10:20:24PM +1200, Richard Procter wrote:
>> 
>>> +
>>> +   return iwm_send_cmd_pdu(sc, IWM_ADD_STA_KEY, IWM_CMD_ASYNC,
>>> +   sizeof(cmd), );
>> 
>> Might the async command open a race between loading keys and decrypting 
>> received packets?
> 
> It can lead to all sorts of races but we have no other option because
> net80211 runs lots of code in interrupt context. E.g. we might decide
> to switch to a new SSID after a scan, which means we have to delete
> the current key and perhaps set a new key. And all this happens in a
> context where we're not allowed to sleep. [snip elaboration]

Ah, I see. 

> Regarding the small window between command initiation and success:
> Encryption only affects unicast data frames, and we've already completed
> a 4-way handshake with the peer which means newly arriving data frames will
> be encrypted. Management frames will still pass. I guess the worst that
> might happen is that some data frames sent before the key was installed
> might be retransmitted because the firmware would fail to decrypt them.

> FWIW, iwn(4) has had the same problem forever and I don't see anyone
> complaining.

Ok, no objection. 

btw, I’ve now also tested the fall-back software path against a TKIP AP.

best, 
Richard. 



Re: [PATCH] add ping(1)-like stats to tcpbench(1)

2020-05-03 Thread Richard Procter



> On 4/05/2020, at 9:30 AM, Stuart Henderson  wrote:
> 
> On 2020/05/04 09:23, Richard Procter wrote:
>> I like it. 
>> 
>> Assuming a mention in tcpbench.1 - ok procter
> 
> ok like this?  text stolen from ping.

the server has an small edge case: it refuses to display stats 
that don’t exist, e.g. if no data transfer has occured. Though 
it will indicate receipt of SIGINFO by printing ‘\n’.

I see ping(1) in the same circumstance omits the stats line alone.

But that’s perhaps something for a later commit. I don’t think it 
warrants mention in the man page. 

ok procter

> Index: tcpbench.1
> ===
> RCS file: /cvs/src/usr.bin/tcpbench/tcpbench.1,v
> retrieving revision 1.27
> diff -u -p -r1.27 tcpbench.1
> --- tcpbench.12 May 2020 22:00:29 -   1.27
> +++ tcpbench.13 May 2020 21:29:22 -
> @@ -82,6 +82,15 @@ Its accuracy may be increased by decreas
> .Ar interval .
> The summary bytes and duration cover the interval from transfer start
> to process exit.
> +The summary information can also be displayed while
> +.Nm
> +is running by sending it a
> +.Dv SIGINFO
> +signal (see the
> +.Cm status
> +argument of
> +.Xr stty 1
> +for more information).
> .Pp
> The options are as follows:
> .Bl -tag -width Ds
> Index: tcpbench.c
> ===
> RCS file: /cvs/src/usr.bin/tcpbench/tcpbench.c,v
> retrieving revision 1.62
> diff -u -p -r1.62 tcpbench.c
> --- tcpbench.c2 May 2020 22:00:29 -   1.62
> +++ tcpbench.c3 May 2020 21:29:22 -
> @@ -213,6 +213,10 @@ signal_handler(int sig, short event, voi
>* signal handler rules don't apply, libevent decouples for us
>*/
>   switch (sig) {
> + case SIGINFO:
> + printf("\n");
> + wrapup(-1);
> + break;
>   case SIGINT:
>   printf("\n");
>   wrapup(0);
> @@ -1109,7 +1113,8 @@ wrapup(int err)
>   summary_display();
>   }
> 
> - exit(err);
> + if (err != -1)
> + exit(err);
> }
> 
> int
> @@ -1126,7 +1131,7 @@ main(int argc, char **argv)
>   int family = PF_UNSPEC;
>   struct nlist nl[] = { { "_tcbtable" }, { "" } };
>   const char *host = NULL, *port = DEFAULT_PORT, *srcbind = NULL;
> - struct event ev_sigint, ev_sigterm, ev_sighup, ev_progtimer;
> + struct event ev_sigint, ev_sigterm, ev_sighup, ev_siginfo, ev_progtimer;
>   struct sockaddr_un sock_un;
> 
>   /* Init world */
> @@ -1362,9 +1367,11 @@ main(int argc, char **argv)
>   signal_set(_sigterm, SIGTERM, signal_handler, NULL);
>   signal_set(_sighup, SIGHUP, signal_handler, NULL);
>   signal_set(_sigint, SIGINT, signal_handler, NULL);
> + signal_set(_siginfo, SIGINFO, signal_handler, NULL);
>   signal_add(_sigint, NULL);
>   signal_add(_sigterm, NULL);
>   signal_add(_sighup, NULL);
> + signal_add(_siginfo, NULL);
>   signal(SIGPIPE, SIG_IGN);
> 
>   if (UDP_MODE) {
> 



Re: [PATCH] add ping(1)-like stats to tcpbench(1)

2020-05-03 Thread Richard Procter
I like it. 

Assuming a mention in tcpbench.1 - ok procter

> On 4/05/2020, at 4:25 AM, Stuart Henderson  wrote:
> 
> Is it worth triggering this on SIGINFO? I use that often with ping(1).
> 
> Index: tcpbench.c
> ===
> RCS file: /cvs/src/usr.bin/tcpbench/tcpbench.c,v
> retrieving revision 1.62
> diff -u -p -r1.62 tcpbench.c
> --- tcpbench.c2 May 2020 22:00:29 -   1.62
> +++ tcpbench.c3 May 2020 16:24:05 -
> @@ -213,6 +213,10 @@ signal_handler(int sig, short event, voi
>* signal handler rules don't apply, libevent decouples for us
>*/
>   switch (sig) {
> + case SIGINFO:
> + printf("\n");
> + wrapup(-1);
> + break;
>   case SIGINT:
>   printf("\n");
>   wrapup(0);
> @@ -1109,7 +1113,8 @@ wrapup(int err)
>   summary_display();
>   }
> 
> - exit(err);
> + if (err != -1)
> + exit(err);
> }
> 
> int
> @@ -1126,7 +1131,7 @@ main(int argc, char **argv)
>   int family = PF_UNSPEC;
>   struct nlist nl[] = { { "_tcbtable" }, { "" } };
>   const char *host = NULL, *port = DEFAULT_PORT, *srcbind = NULL;
> - struct event ev_sigint, ev_sigterm, ev_sighup, ev_progtimer;
> + struct event ev_sigint, ev_sigterm, ev_sighup, ev_siginfo, ev_progtimer;
>   struct sockaddr_un sock_un;
> 
>   /* Init world */
> @@ -1362,9 +1367,11 @@ main(int argc, char **argv)
>   signal_set(_sigterm, SIGTERM, signal_handler, NULL);
>   signal_set(_sighup, SIGHUP, signal_handler, NULL);
>   signal_set(_sigint, SIGINT, signal_handler, NULL);
> + signal_set(_siginfo, SIGINFO, signal_handler, NULL);
>   signal_add(_sigint, NULL);
>   signal_add(_sigterm, NULL);
>   signal_add(_sighup, NULL);
> + signal_add(_siginfo, NULL);
>   signal(SIGPIPE, SIG_IGN);
> 
>   if (UDP_MODE) {
> 



Re: mcx(4) checksum offload

2020-05-19 Thread Richard Procter


> On 19/05/2020, at 11:19 AM, Alexandr Nedvedicky 
>  wrote:
> 
> I'm not sure about outbound path. chksum offload on outbound side got killed
> back in 2016 (I think). All that logic behind dealing with various HW did not
> pay off. If PF is doing NAT it's far more simple and reliable if we compute
> chksum in OS. 

This is not quite correct: although we stopped regenerating checksums in 2016,
we continue to compute checksums late in the output path if needed[0], and
offload that to the card if supported.

The mbuf M_*_CSUM_OUT flags indicate if checksumming is needed, and offloading
drivers pass this state through to the card; the diff should do this, too.

cheers, 
Richard.

[0] i.e. for packets of local origin; 
see netinet/ip_output.c:in_proto_cksum_out()



Re: Donation of Power PC Based boards RB800 I have 5x if they are any use for training / testing

2020-05-19 Thread Richard Procter
Hi Tom, 

Thanks for offering.

I built a personal OpenBSD/socppc 6.5 release last year for my RB600A and
continue to use it. I tried building 6.6 but found compiling clang infeasible
on a platform with only 128MB of memory and PIO compact flash. I tried to
improve the swap experience by adding an SSD but gave up due to problems with
my miniPCI SATA card.

So although I’m fond of the boards I do not expect to see further development
on them. (That said, it should be possible in principle to compile a socppc 
kernel
and use the macppc userspace; and if you or anyone reading would like the 
socppc 6.5 release, contact me off-list.)

Regards the RB800, from what I understand these require significant effort
to support as they use a substantially different PowerPC core from that
supported by our powerpc code[0].

cheers, 
Richard.

[0] https://marc.info/?l=openbsd-misc=126938500604186=2


> On 20/05/2020, at 5:20 AM, Tom Smyth  wrote:
> 
> Hello Devs,
> Thanks again for all your work on OpenBSD 6.7
> just checking if any of you would need / want RB800s,   as per mail below,
> 
> I also have 2x RB600s which at one stage did run OpenBSD  PPC edition
> if any dev want them contact me off list and Ill have them shipped to you
> Thanks and stay safe people ...
> 1 person non dev expressed an interest in them but I would like to
> give preference
> to those who work on hardware dev testing in OpenBSD ...  if the boards were
> in fact useful
> Thanks
> Tom Smyth
> 
> On Tue, 12 May 2020 at 21:12, Tom Smyth  wrote:
>> 
>> Hello
>> does any OpenBSD Developer want some Power PC SBC  the specs
>> 
>> 
>> Product code RB800
>> CPU  MPC8533EVTALF
>> CPU core count 1
>> CPU nominal frequency800 MHz
>> RAM   256MB
>> onboard NAND storage 512MB
>> 
>> there are 4x Mini PCI slots
>> and 1x PCI-E
>> and 1 Compact Flash slots
>> 3x 1Gb/s Ports
>> 
>> 
>> they have a wide input voltage range for powering and can be powered via POE
>> 
>> I have atleast 5x in stock and will ship them to any interested Developer ?
>> 
>> https://mikrotik.com/product/RB800#fndtn-specifications
>> 
>> --
>> Kindest regards,
>> Tom Smyth.
> 
> 
> 
> -- 
> Kindest regards,
> Tom Smyth.
> 



Re: TCP congestion window pinned by integer arithmetic

2020-06-09 Thread Richard Procter
Hi, 

> On 2/06/2020, at 9:41 AM, Brian Brombacher  wrote:
> 
> Hi,
> 
> RFC 5681 Section 3.1 has an Implementation Note that covers the bug fixed by 
> the following patch.  I ran into this bug testing on a high latency link.  My 
> congestion window was pinned to a specific value and could not open further, 
> causing a lack of bandwidth utilization.
> 
> I chose if/assign rather than max(9) for clarity with the RFC; however, both 
> work fine.

As discussed off-list, I am setting up a test harness for this.

cheers,
Richard.


> Index: tcp_input.c
> ===
> RCS file: /home/brian/cvs/src/sys/netinet/tcp_input.c,v
> retrieving revision 1.364
> diff -u -r1.364 tcp_input.c
> --- tcp_input.c   6 Dec 2019 14:43:14 -   1.364
> +++ tcp_input.c   1 Jun 2020 21:16:26 -
> @@ -1707,8 +1707,11 @@
>   u_int cw = tp->snd_cwnd;
>   u_int incr = tp->t_maxseg;
> 
> - if (cw > tp->snd_ssthresh)
> + if (cw > tp->snd_ssthresh) {
>   incr = incr * incr / cw;
> + if (incr == 0)
> + incr = 1;
> + }
>   if (tp->t_dupacks < tcprexmtthresh)
>   tp->snd_cwnd = ulmin(cw + incr,
>   TCP_MAXWIN << tp->snd_scale);
> 



Re: bgpd refactor timer code

2020-12-10 Thread Richard Procter
Hi Claudio, 

> On 10/12/2020, at 1:13 AM, Claudio Jeker  wrote:
> 
> This diff makes the timer code independent from struct peer. This way
> it can be used in different places without too much issues.

ok procter@ 

> OK?
> -- 
> :wq Claudio
> 
> Index: control.c
> ===
> RCS file: /cvs/src/usr.sbin/bgpd/control.c,v
> retrieving revision 1.101
> diff -u -p -r1.101 control.c
> --- control.c 5 Nov 2020 11:28:11 -   1.101
> +++ control.c 9 Dec 2020 11:57:05 -
> @@ -333,7 +333,8 @@ control_dispatch_msg(struct pollfd *pfd,
>   IMSG_CTL_SHOW_NEIGHBOR,
>   0, 0, -1, p, sizeof(*p));
>   for (i = 1; i < Timer_Max; i++) {
> - if (!timer_running(p, i, ))
> + if (!timer_running(>timers,
> + i, ))
>   continue;
>   ct.type = i;
>   ct.val = d;
> @@ -403,7 +404,8 @@ control_dispatch_msg(struct pollfd *pfd,
>   if (!p->conf.down) {
>   session_stop(p,
>   ERR_CEASE_ADMIN_RESET);
> - timer_set(p, Timer_IdleHold,
> + timer_set(>timers,
> + Timer_IdleHold,
>   SESSION_CLEAR_DELAY);
>   } else {
>   session_stop(p,
> Index: session.c
> ===
> RCS file: /cvs/src/usr.sbin/bgpd/session.c,v
> retrieving revision 1.405
> diff -u -p -r1.405 session.c
> --- session.c 5 Nov 2020 14:44:59 -   1.405
> +++ session.c 9 Dec 2020 11:56:19 -
> @@ -263,7 +263,8 @@ session_main(int debug, int verbose)
>   if (p->reconf_action == RECONF_REINIT) {
>   session_stop(p, ERR_CEASE_ADMIN_RESET);
>   if (!p->conf.down)
> - timer_set(p, Timer_IdleHold, 0);
> + timer_set(>timers,
> + Timer_IdleHold, 0);
>   }
> 
>   /* deletion due? */
> @@ -272,7 +273,7 @@ session_main(int debug, int verbose)
>   session_demote(p, -1);
>   p->conf.demote_group[0] = 0;
>   session_stop(p, ERR_CEASE_PEER_UNCONF);
> - timer_remove_all(p);
> + timer_remove_all(>timers);
>   tcp_md5_del_listener(conf, p);
>   log_peer_warnx(>conf, "removed");
>   RB_REMOVE(peer_head, >peers, p);
> @@ -366,10 +367,10 @@ session_main(int debug, int verbose)
>   now = getmonotime();
>   RB_FOREACH(p, peer_head, >peers) {
>   time_t  nextaction;
> - struct peer_timer *pt;
> + struct timer *pt;
> 
>   /* check timers */
> - if ((pt = timer_nextisdue(p, now)) != NULL) {
> + if ((pt = timer_nextisdue(>timers, now)) != NULL) {
>   switch (pt->type) {
>   case Timer_Hold:
>   bgp_fsm(p, EVNT_TIMER_HOLDTIME);
> @@ -387,24 +388,27 @@ session_main(int debug, int verbose)
>   p->IdleHoldTime =
>   INTERVAL_IDLE_HOLD_INITIAL;
>   p->errcnt = 0;
> - timer_stop(p, Timer_IdleHoldReset);
> + timer_stop(>timers,
> + Timer_IdleHoldReset);
>   break;
>   case Timer_CarpUndemote:
> - timer_stop(p, Timer_CarpUndemote);
> + timer_stop(>timers,
> + Timer_CarpUndemote);
>   if (p->demoted &&
>   p->state == STATE_ESTABLISHED)
>   session_demote(p, -1);
>  

Re: Wireguard: can't remove multiple peers at once.

2021-01-24 Thread Richard Procter
Hi, 

> On 14/01/2021, at 8:33 PM, YASUOKA Masahiko  wrote:
> 
> Hi,
> 
> On Thu, 14 Jan 2021 08:54:36 +0900
> Yuichiro NAITO  wrote:
>> Does anybody please review my code?
>> 
>> Yasuoka-san is my coleague of my work.
>> So, he is interested in this topic. That’s why I CCed this mail.
>> I don’t mean he is an reviewer.
>> 
>>> 2021/01/12 11:27、Yuichiro NAITO のメール:
>>> I have set up multiple peers in a wg0 interface,
>>> and tried to remove more than one peers at once.
>>> Ifconfig(1) only removes the first peer.
>>> 
>>> Command line was like following.
>>> 
>>> ```
>>> # ifconfig wg0 -wgpeer  -wgpeer  -wgpeer 
>>> ```
>>> 
>>> Only  was removed.
>>> 
>>> I think next peer pointer isn't calculated in case of removing peer
>>> in sys/net/if_wg.c: wg_ioctl_set() function.
>>> 
>>> I have tried following patch that can fix this problem.
> 
> Yes, the diff seems good.
> 
> I made the following whitespace change.
> 
>> @@ -2333,6 +2333,11 @@ wg_ioctl_set(struct wg_softc *sc, struct wg_data_io 
>> *data)
>>  }
>> 
>>  peer_p = (struct wg_peer_io *)aip_p;
>> +continue;
>> +next_peer:
>> +aip_p = _p->p_aips[0];
>> +aip_p += peer_o.p_aips_count;
>> +peer_p = (struct wg_peer_io *)aip_p;
>>  }
>> 
>> error:
> 
> It seems we prefer putting goto labels at the beginning of the line.
> 
> 
> ok?

ok procter@ 

> 
> Fix wg(4) ioctl to be able to handle multiple wgpeers.
> Diff from Yuichiro NAITO.
> 
> Index: sys/net/if_wg.c
> ===
> RCS file: /cvs/src/sys/net/if_wg.c,v
> retrieving revision 1.14
> diff -u -p -r1.14 if_wg.c
> --- sys/net/if_wg.c   1 Sep 2020 19:06:59 -   1.14
> +++ sys/net/if_wg.c   14 Jan 2021 07:26:48 -
> @@ -2270,7 +2270,7 @@ wg_ioctl_set(struct wg_softc *sc, struct
> 
>   /* Peer must have public key */
>   if (!(peer_o.p_flags & WG_PEER_HAS_PUBLIC))
> - continue;
> + goto next_peer;
> 
>   /* 0 = latest protocol, 1 = this protocol */
>   if (peer_o.p_protocol_version != 0) {
> @@ -2283,7 +2283,7 @@ wg_ioctl_set(struct wg_softc *sc, struct
>   /* Get local public and check that peer key doesn't match */
>   if (noise_local_keys(>sc_local, public, NULL) == 0 &&
>   bcmp(public, peer_o.p_public, WG_KEY_SIZE) == 0)
> - continue;
> + goto next_peer;
> 
>   /* Lookup peer, or create if it doesn't exist */
>   if ((peer = wg_peer_lookup(sc, peer_o.p_public)) == NULL) {
> @@ -2291,7 +2291,7 @@ wg_ioctl_set(struct wg_softc *sc, struct
>* Also, don't create a new one if we only want to
>* update. */
>   if (peer_o.p_flags & (WG_PEER_REMOVE|WG_PEER_UPDATE))
> - continue;
> + goto next_peer;
> 
>   if ((peer = wg_peer_create(sc,
>   peer_o.p_public)) == NULL) {
> @@ -2303,7 +2303,7 @@ wg_ioctl_set(struct wg_softc *sc, struct
>   /* Remove peer and continue if specified */
>   if (peer_o.p_flags & WG_PEER_REMOVE) {
>   wg_peer_destroy(peer);
> - continue;
> + goto next_peer;
>   }
> 
>   if (peer_o.p_flags & WG_PEER_HAS_ENDPOINT)
> @@ -2332,6 +2332,11 @@ wg_ioctl_set(struct wg_softc *sc, struct
>   aip_p++;
>   }
> 
> + peer_p = (struct wg_peer_io *)aip_p;
> + continue;
> +next_peer:
> + aip_p = _p->p_aips[0];
> + aip_p += peer_o.p_aips_count;
>   peer_p = (struct wg_peer_io *)aip_p;
>   }
> 
> 



Re: broadcast simplex checksum

2021-01-31 Thread Richard Procter


>> On 20/01/2021, at 1:56 PM, Alexander Bluhm  wrote:
>> 
>> Hi,
>> 
>> Simplex interfaces reinject broadcast packets back into the IP
>> stack.  As this is a software features, no hardware checksumming
>> occurs.  So local broadcast packets are dropped with wrong checksum
>> if the underlying hardware supports checksumming.
>> 
>> Do software checksumming in ip_output() if the copy of a broadcast
>> packet will be delivered locally.  Put the logic into a separate
>> in_ifcap_cksum() function.
>> 
>> Found by regress/sys/kern/sosplice/loop which fails on some machines.

Hi, 

- Might the rule disabling checksum offload for broadcasts on IFF_SIMPLEX 
  interfaces be weakened to disable checksum offload for all broadcast 
  packets instead?

This simplifies the logic, and shouldn’t impact performance as 
0) IFCAP_CSUM_* /\ !IFF_SIMPLEX devices appear to be rare[0] and in any case 
1) broadcasts are not high-throughput. It should be safe, as broadcast 
checksums across bridges are already done in software.

(I wonder if IFF_SIMPLEX is a relic of another age and deserves to be removed
at some point; the rare !IFF_SIMPLEX device drivers could presumably filter
out their own received broadcasts. The rest of the stack would then have a 
simpler invariant to work with.)

- what motivates the new '!m->m_pkthdr.pf.routed’ term?

best, 
Richard. 

[0] A coarse test for IFF_BROADCAST /\ !IFF_SIMPLEX devices

/usr/src/sys $ grep  -r IFF_BROADCAST | grep -v IFF_SIMPLEX | grep ' = ‘ 

arch/octeon/dev/if_ogx.c:   ifp->if_flags = IFF_BROADCAST | IFF_MULTICAST;
arch/sgi/hpc/if_sq.c:   ifp->if_flags = IFF_BROADCAST | IFF_MULTICAST;

net/if_bpe.c:   ifp->if_flags = IFF_BROADCAST | IFF_MULTICAST;
net/if_wg.c:ifp->if_flags = IFF_BROADCAST | IFF_MULTICAST | IFF_NOARP;
net/if_vlan.c:  ifp->if_flags = IFF_BROADCAST | IFF_MULTICAST;




>> 
>> ok?
>> 
>> bluhm
>> 
>> Index: netinet/ip_output.c
>> ===
>> RCS file: /mount/openbsd/cvs/src/sys/netinet/ip_output.c,v
>> retrieving revision 1.361
>> diff -u -p -r1.361 ip_output.c
>> --- netinet/ip_output.c  16 Jan 2021 07:58:12 -  1.361
>> +++ netinet/ip_output.c  20 Jan 2021 00:27:12 -
>> @@ -79,6 +79,7 @@ void ip_mloopback(struct ifnet *, struct
>> static __inline u_int16_t __attribute__((__unused__))
>>in_cksum_phdr(u_int32_t, u_int32_t, u_int32_t);
>> void in_delayed_cksum(struct mbuf *);
>> +int in_ifcap_cksum(struct mbuf *, struct ifnet *, int);
>> 
>> #ifdef IPSEC
>> struct tdb *
>> @@ -458,8 +459,7 @@ sendit:
>>   */
>>  if (ntohs(ip->ip_len) <= mtu) {
>>  ip->ip_sum = 0;
>> -if ((ifp->if_capabilities & IFCAP_CSUM_IPv4) &&
>> -(ifp->if_bridgeidx == 0))
>> +if (in_ifcap_cksum(m, ifp, IFCAP_CSUM_IPv4))
>>  m->m_pkthdr.csum_flags |= M_IPV4_CSUM_OUT;
>>  else {
>>  ipstat_inc(ips_outswcsum);
>> @@ -716,9 +716,7 @@ ip_fragment(struct mbuf *m, struct ifnet
>>  m->m_pkthdr.ph_ifidx = 0;
>>  mhip->ip_off = htons((u_int16_t)mhip->ip_off);
>>  mhip->ip_sum = 0;
>> -if ((ifp != NULL) &&
>> -(ifp->if_capabilities & IFCAP_CSUM_IPv4) &&
>> -(ifp->if_bridgeidx == 0))
>> +if (in_ifcap_cksum(m, ifp, IFCAP_CSUM_IPv4))
>>  m->m_pkthdr.csum_flags |= M_IPV4_CSUM_OUT;
>>  else {
>>  ipstat_inc(ips_outswcsum);
>> @@ -737,9 +735,7 @@ ip_fragment(struct mbuf *m, struct ifnet
>>  ip->ip_len = htons((u_int16_t)m->m_pkthdr.len);
>>  ip->ip_off |= htons(IP_MF);
>>  ip->ip_sum = 0;
>> -if ((ifp != NULL) &&
>> -(ifp->if_capabilities & IFCAP_CSUM_IPv4) &&
>> -(ifp->if_bridgeidx == 0))
>> +if (in_ifcap_cksum(m, ifp, IFCAP_CSUM_IPv4))
>>  m->m_pkthdr.csum_flags |= M_IPV4_CSUM_OUT;
>>  else {
>>  ipstat_inc(ips_outswcsum);
>> @@ -1849,15 +1845,15 @@ in_proto_cksum_out(struct mbuf *m, struc
>>  }
>> 
>>  if (m->m_pkthdr.csum_flags & M_TCP_CSUM_OUT) {
>> -if (!ifp || !(ifp->if_capabilities & IFCAP_CSUM_TCPv4) ||
>> -ip->ip_hl != 5 || ifp->if_bridgeidx != 0) {
>> +if (!in_ifcap_cksum(m, ifp, IFCAP_CSUM_TCPv4) ||
>> +ip->ip_hl != 5) {
>>  tcpstat_inc(tcps_outswcsum);
>>  in_delayed_cksum(m);
>>  m->m_pkthdr.csum_flags &= ~M_TCP_CSUM_OUT; /* Clear */
>>  }
>>  } else if (m->m_pkthdr.csum_flags & M_UDP_CSUM_OUT) {
>> -if (!ifp || !(ifp->if_capabilities & IFCAP_CSUM_UDPv4) ||
>> -ip->ip_hl != 5 || ifp->if_bridgeidx != 0) {
>> +if (!in_ifcap_cksum(m, ifp, IFCAP_CSUM_UDPv4) ||
>> +ip->ip_hl != 5) {
>>  udpstat_inc(udps_outswcsum);
>>  in_delayed_cksum(m);
>>