Re: [Patch] pf refactoring

2015-08-17 Thread Martin Pieuchot
Hello Richard,

On 17/08/15(Mon) 17:39, Richard Procter wrote:
 Hi, 
 
 This series of 29 small diffs slims pf.o by 2640 bytes and pf.c by
 113 non-comment lines.

We generally discuss one diff per mail.  It makes it easier for people
to comment and as you can imagine deal with possible conflicts in their
tree :)

As you might now, Sasha has some upcoming changes with pending review :)

 pf_translate(), in particular, is now much shorter and clearer[0].
 [...]
 [0] As pf_translate() is called only on state creation it isn't
 terribly performance sensitive, either.

This is a valid point, out of curiosity what drove you in this part of
the code?

Martin



Re: [Patch] pf refactoring

2015-08-17 Thread Martin Pieuchot
On 17/08/15(Mon) 13:38, Alexey Suslikov wrote:
 Martin Pieuchot mpi at openbsd.org writes:
 
  On 17/08/15(Mon) 17:39, Richard Procter wrote:
   Hi, 
   
   This series of 29 small diffs slims pf.o by 2640 bytes and pf.c by
   113 non-comment lines.
  
  We generally discuss one diff per mail.  It makes it easier for people
  to comment and as you can imagine deal with possible conflicts in their
  tree :)
 
 As far as I understood, Richard provided step-by-step refactor diffs.
 
 What you want to discuss is a cumulative diff he referred to.

No, what I want to discuss is one diff at a time not 29.  That doesn't
mean it has to be a cumulative diff.



Re: [Patch] pf refactoring

2015-08-17 Thread Alexey Suslikov
Martin Pieuchot mpi at openbsd.org writes:

 On 17/08/15(Mon) 17:39, Richard Procter wrote:
  Hi, 
  
  This series of 29 small diffs slims pf.o by 2640 bytes and pf.c by
  113 non-comment lines.
 
 We generally discuss one diff per mail.  It makes it easier for people
 to comment and as you can imagine deal with possible conflicts in their
 tree :)

As far as I understood, Richard provided step-by-step refactor diffs.

What you want to discuss is a cumulative diff he referred to.



[Patch] pf refactoring

2015-08-16 Thread Richard Procter
Hi, 

This series of 29 small diffs slims pf.o by 2640 bytes and pf.c by
113 non-comment lines.

pf_translate(), in particular, is now much shorter and clearer[0].

I've tested it by running on my home router (alix, i386, inet4) for a
week or so without issue, and by profiling it under stress. I saw no
significant performance impact, and it's, possibly, even a little more
efficient. See below for the profile details.

The patch's length is due to its conservatism; the cumulative patch 
is much smaller.

At each step, I attempted a correctness argument and some of these follow
the calculational approach of Dijkstra and others. For background, see
e.g. EWD1300 and EWD1123 (for the interested, I've found A.J.M van
Gasteren's 'On the shape of mathematical arguments' a stellar read, and
also recommend EWD1023). I've found it a useful exercise, particularly
as the effort needed to make the argument indicates the effort required
to review the diff, and several times I was led to take smaller bites.


best, 
Richard. 

PS. If you, reading this, like the patch, or even if you don't, 
please drop me a line to let me know. Thanks! 

[0] As pf_translate() is called only on state creation it isn't
terribly performance sensitive, either.

Profiling -

Setup: netcat a 415MB file through the profiled router. 

  [A] -- 192.168.3.2 vr1 -- [router] -- vr0 192.168.1.2 -- [B]

(vr0 is part of a bridge.)

pf.conf contains
match on vr1 scrub (random-id, reassemble tcp)
match out on vr0 inet from !vr0:network nat-to $router
pass

before patch: real 1m18.118s 
 after patch: real 1m17.282s, 1m18.150s, 1m17.716s

before patch:

1.044.92 1303400/1303400pf_test [9]
[13] 5.91.044.92 1303400pf_test_state [13]
0.892.10 1301332/1301332pf_tcp_track_full [19]
0.360.69 1303400/1304361pf_find_state [43]
0.290.06 1301273/1951905m_copyback [63]
0.200.00 3257446/5872085pf_addrcpy [74]
0.080.08  650646/650647 pf_change_a [93]
0.100.06  650646/650647 pf_change_ap [95]

after patch:
(note, profiler spent less time in the idle loop before the test began, 
so the time percentages 5.9, 6.9 aren't comparable between these runs.)

0.964.88 1301679/1301679 pf_test [7]
[13] 6.90.964.88 1301679 pf_test_state [13]
0.891.99 1301628/1301628 pf_tcp_track_full [19]
0.390.59 1301679/1301680 pf_find_state [44]
0.190.16 1301580/1301582 pf_change_a [74]
0.280.05 1301576/1952366 m_copyback [64]
0.130.06 1301580/1301582 pf_change_16 [87]
0.150.00 2603358/5857533 pf_addrcpy [75]

Patch 

To apply against HEAD (pf.c:1.935, pfvar.h:1.419):

First apply the checksum patch I posted to tech@ 
Re: [patch] cleaner checksum modification for pf, 
Message-Id: 98bb5cd8-7b74-4a79-a333-70749e3bc...@gmail.com. 
(I'm also happy to regenerate what follows directly against HEAD 
if necessary.)

then, 

# cd /usr/src/sys
# cat - | patch 

- add pf_change_p(), change port, and use instead of pf_change_ap() when
 changing port only.

No functional change: all replaced pf_change_ap() leave address unchanged.

Argument:

(note: afto == (pd-af != nk-af))

All replaced pf_change_ap() executed under afto with arg naf := nk-af   
=   { pf_change_ap() } 
All replaced executions of pf_change_a() executed 
with args af := pd-af, naf := nk-af, under afto which implies af != naf 
=   { afto and pf_change_a() guarded with 'if (af != naf) return;' }
All replaced executions of pf_change_a() are no-op 
=   { pf_change_ap() } 
All replaced pf_change_ap() leave address unchanged

[Context=15]  

Index: net/pf.c
===
--- net.orig/pf.c
+++ net/pf.c
@@ -139,30 +139,31 @@ union pf_headers {

struct pool  pf_src_tree_pl, pf_rule_pl, pf_queue_pl;
struct pool  pf_state_pl, pf_state_key_pl, pf_state_item_pl;
struct pool  pf_rule_item_pl, pf_sn_item_pl;

void pf_init_threshold(struct pf_threshold *, u_int32_t,
u_int32_t);
void pf_add_threshold(struct pf_threshold *);
int  pf_check_threshold(struct pf_threshold *);
void pf_cksum_fixup(u_int16_t *, u_int16_t, u_int16_t,
u_int8_t);
void pf_cksum_fixup_a(u_int16_t *, const struct pf_addr *,
const struct pf_addr *, sa_family_t, u_int8_t);
void pf_change_32(struct pf_pdesc *, u_int32_t *,