Hi,

These patches against HEAD re-instate the pf algorithm of OpenBSD 5.4
for preserving payload checksums end-to-end but rewritten without the
ugly and error-prone (but speedy!) nested pf_cksum_fixup calls.

I have been running this code on a small Alix (i386) IPv4 gateway for a
month with no obvious issues. To test as many of the affected features
as possible, its pf.conf included:

  match scrub (random-id) 
  match on egress scrub (max-mss 1440, reassemble tcp)
  match out on egress from !egress:network nat-to egress:0 
  pass out on egress inet proto tcp modulate state 
  pass in inet proto tcp from any to egress port ... rdr-to ...

I've tried to avoid significant performance impact on modern hardware,
and don't expect any, but have not tested this. I've aimed for
simplicity in the first instance and there is scope for optimisation if
necessary.

I've attached my test notes below, covering every change. Note:

 - I was unable to test af-to, which does a lot of packet fiddling.
   I've never used it before and was unable to get it working on a 
   generic kernel. I figure I'm just missing something. I used the line

   pass out on vr0 inet af-to inet6 from fec0:0:0:2::1 to fec0:0:0:2::2

   but although inet4 tcp connection attempts were translated 
   to fec0:0:0:2::2, its SYN replies received RST from the 
   router, fec0:0:02:::1.

 - My inet6 testing was limited to two nodes connected via the alix
   router.

 - I've assumed that "rdr-to" is good if "nat-to" tests ok as the code
   paths look virtually identical.

 - I haven't tested modification of unaligned TCP options,
   for SACK and timestamp, but have tested the unaligned paths of the
   change primitives.

 - the patch includes a small fix for pf_pdesc_setup to setup the
   pdesc protocol checksum for ICMPv6

The patch is in three commits to ease review, to be applied in order:

 0. rename pf_change_a -> pf_change_32_unaligned to better reflect its
 use 
 1. reinstate pf_cksum_fixup sans nesting 
 2. avoid unnecessary calls to pf_change_32_unaligned

...but if another format is easier, let me know.

This patch should be examined closely --- it's been a while since I've
worked at this level and I've never worked on OpenBSD code.

I'm keen to hear comments, questions or criticisms.

best, 
Richard.  

 - [G] means that errors here are expected 
to have been exposed in the course of running on the gateway. 

 - [O] means these changes involve primitives tested elsewhere.

pf_tcp_track_full, pf_create_state [G]
        - modulate sequence number  ("modulate state")
             "pf_change_a", old unaligned copy -> pf_change_32
             new pf_change_a
                
pf_translate 
        - translate address and port, if any 
                AF_INET [G]
                AF_INET6 [TESTED between two addresses UDP, TCP, ICMP6]
                     
        - translate ICMP icmp_id for ICMP_ECHO [TESTED]
            pf_change_16        
        - translate ICMP6 icmp6_id for ICMP6_ECHO [TESTED]
            pf_change_16
        - translate address for non TCP,UDP,ICMP,ICMP6 protocol [TESTED]

             "pf_change_a", old unaligned copy -> pf_change_32
            new pf_change_a
        
pf_modulate_sack 
        - modulate SACK sequence numbers [G]
                pf_change_32_unaligned [need to test unaligned options]
                minor refactoring to support fixup

pf_test_state_icmp
        - translate ICMP unrelated to another connection, e.g ECHO [TESTED]
                - translate icmp_id
                - translate address
                        new pf_change_a 

        - ICMP error related to another connection, e.g dest unreachable
                - translate address, quoted address, port ("nat", "rdr-to") 
                        via pf_change_icmp, see this 
                        
                - translate quoted address family ("af-to") [*]
                        via pf_change_icmp_af
                        pf_change_ap

                TCP 
                   - demodulate quoted TCP sequence number [TESTED]
                        "pf_change_a", old unaligned -> pf_change_32
                UDP [TESTED]
                   - zero quoted UDP checksum
                        pf_change_16

pf_change_icmp [O]
        - change quoted protocol port and address, if any 
        - change outer ip address 
                pf_cksum_fixup_a
                pf_cksum_fixup
                pf_change_a                                   

pf_change_icmp_af
        - replace quoted IPv4 / IPv6 headers with converse AF. 
                pf_cksum_cover / uncover

pf_translate_icmp_af [O] 
        - to AF_INET
                - sets icmp type, code
                  nextmtu for ICMP6_PACKET_TOO_BIG
                  pptr for ICMP_PARAMPROB
        - to AF_INET6
                - sets icmp type, code
                  nextmtu for ICMP_UNREACH_NEEDFRAG
                  ptr for ICMP_PARAMPROB

        pf_change_8 [TESTED when testing alternate pf_change_32_unaligned 
codepath for SACK options] 
        pf_change_32 [TESTED by modulate state] 
        pf_change_16 [TESTED by icmp echo and zeroed UDP sum] 

pf_normalize_tcp 
        - zeros non-zero th_x2 [TESTED]
                reinstates old hack
        - On SYN & FIN, clears FIN [TESTED]
        - On ~URG, zeros urgent pointer [TESTED]

pf_normalize_tcp_stateful
        - modulate timestamp (on "reassemble tcp") [G]
                pf_change_32_unaligned
                refactoring to support fixup 

pf_normalize_mss [TESTED]
        - mss clamping 
                pf_change_16_unaligned
                refactoring to support fixup

===============================================================================
Patches:
===============================================================================
rename pf_change_a -> pf_change_32_unaligned to better reflect its use
--- pf.c.orig   Wed Jun  3 13:49:28 2015
+++ pf.c        Thu Jun  4 13:27:24 2015
@@ -1666,7 +1666,7 @@
 
 /* Changes a u_int32_t.  Uses a void * so there are no align restrictions */
 void
-pf_change_a(struct pf_pdesc *pd, void *a, u_int32_t an)
+pf_change_32_unaligned(struct pf_pdesc *pd, void *a, u_int32_t an)
 {
        if (pd->csum_status == PF_CSUM_UNKNOWN)
                pf_check_proto_cksum(pd, pd->off, pd->tot_len - pd->off,
@@ -2277,10 +2277,10 @@
                                for (i = 2; i + TCPOLEN_SACK <= olen;
                                    i += TCPOLEN_SACK) {
                                        memcpy(&sack, &opt[i], sizeof(sack));
-                                       pf_change_a(pd, &sack.start,
+                                       pf_change_32_unaligned(pd, &sack.start,
                                            htonl(ntohl(sack.start) -
                                            dst->seqdiff));
-                                       pf_change_a(pd, &sack.end,
+                                       pf_change_32_unaligned(pd, &sack.end,
                                            htonl(ntohl(sack.end) -
                                            dst->seqdiff));
                                        memcpy(&opt[i], &sack, sizeof(sack));
@@ -3499,7 +3499,7 @@
                        if ((s->src.seqdiff = pf_tcp_iss(pd) - s->src.seqlo) ==
                            0)
                                s->src.seqdiff = 1;
-                       pf_change_a(pd, &th->th_seq,
+                       pf_change_32_unaligned(pd, &th->th_seq,
                            htonl(s->src.seqlo + s->src.seqdiff));
                        *rewrite = 1;
                } else
@@ -3698,12 +3698,12 @@
 #endif /* INET6 */
                } else {
                        if (PF_ANEQ(saddr, pd->src, pd->af)) {
-                               pf_change_a(pd, &pd->src->v4.s_addr,
+                               pf_change_32_unaligned(pd, &pd->src->v4.s_addr,
                                    saddr->v4.s_addr);
                                rewrite = 1;
                        }
                        if (PF_ANEQ(daddr, pd->dst, pd->af)) {
-                               pf_change_a(pd, &pd->dst->v4.s_addr,
+                               pf_change_32_unaligned(pd, &pd->dst->v4.s_addr,
                                    daddr->v4.s_addr);
                                rewrite = 1;
                        }
@@ -3763,12 +3763,12 @@
                switch (pd->af) {
                case AF_INET:
                        if (!afto && PF_ANEQ(saddr, pd->src, pd->af)) {
-                               pf_change_a(pd, &pd->src->v4.s_addr,
+                               pf_change_32_unaligned(pd, &pd->src->v4.s_addr,
                                    saddr->v4.s_addr);
                                rewrite = 1;
                        }
                        if (!afto && PF_ANEQ(daddr, pd->dst, pd->af)) {
-                               pf_change_a(pd, &pd->dst->v4.s_addr,
+                               pf_change_32_unaligned(pd, &pd->dst->v4.s_addr,
                                    daddr->v4.s_addr);
                                rewrite = 1;
                        }
@@ -3831,8 +3831,8 @@
                        while ((src->seqdiff = arc4random() - seq) == 0)
                                ;
                        ack = ntohl(th->th_ack) - dst->seqdiff;
-                       pf_change_a(pd, &th->th_seq, htonl(seq + src->seqdiff));
-                       pf_change_a(pd, &th->th_ack, htonl(ack));
+                       pf_change_32_unaligned(pd, &th->th_seq, htonl(seq + 
src->seqdiff));
+                       pf_change_32_unaligned(pd, &th->th_ack, htonl(ack));
                        *copyback = 1;
                } else {
                        ack = ntohl(th->th_ack);
@@ -3883,8 +3883,8 @@
                ack = ntohl(th->th_ack) - dst->seqdiff;
                if (src->seqdiff) {
                        /* Modulate sequence numbers */
-                       pf_change_a(pd, &th->th_seq, htonl(seq + src->seqdiff));
-                       pf_change_a(pd, &th->th_ack, htonl(ack));
+                       pf_change_32_unaligned(pd, &th->th_seq, htonl(seq + 
src->seqdiff));
+                       pf_change_32_unaligned(pd, &th->th_ack, htonl(ack));
                        *copyback = 1;
                }
                end = seq + pd->p_len;
@@ -4550,12 +4550,12 @@
 #endif /* INET6 */
                                if (!afto && PF_ANEQ(pd->src,
                                    &nk->addr[sidx], AF_INET))
-                                       pf_change_a(pd, &saddr->v4.s_addr,
+                                       pf_change_32_unaligned(pd, 
&saddr->v4.s_addr,
                                            nk->addr[sidx].v4.s_addr);
 
                                if (!afto && PF_ANEQ(pd->dst,
                                    &nk->addr[didx], AF_INET)) {
-                                       pf_change_a(pd, &daddr->v4.s_addr,
+                                       pf_change_32_unaligned(pd, 
&daddr->v4.s_addr,
                                            nk->addr[didx].v4.s_addr);
                                        pd->destchg = 1;
                                }
@@ -4747,7 +4747,7 @@
                        /* Demodulate sequence number */
                        seq = ntohl(th.th_seq) - src->seqdiff;
                        if (src->seqdiff) {
-                               pf_change_a(pd, &th.th_seq, htonl(seq));
+                               pf_change_32_unaligned(pd, &th.th_seq, 
htonl(seq));
                                copyback = 1;
                        }
 
--- pfvar.h.orig        Thu Jun  4 13:46:49 2015
+++ pfvar.h             Wed Jun  3 09:15:16 2015
@@ -1813,7 +1813,7 @@
 
 void   *pf_pull_hdr(struct mbuf *, int, void *, int, u_short *, u_short *,
            sa_family_t);
-void   pf_change_a(struct pf_pdesc *, void *, u_int32_t);
+void   pf_change_32_unaligned(struct pf_pdesc *, void *, u_int32_t);
 int    pf_check_proto_cksum(struct pf_pdesc *, int, int, u_int8_t,
            sa_family_t);
 int    pflog_packet(struct pf_pdesc *, u_int8_t, struct pf_rule *,
--- pf_norm.c.orig      Thu Jun  4 13:55:10 2015
+++ pf_norm.c   Thu Jun  4 13:57:37 2015
@@ -1058,7 +1058,7 @@
                                            (src->scrub->pfss_flags &
                                            PFSS_TIMESTAMP)) {
                                                tsval = ntohl(tsval);
-                                               pf_change_a(pd, &opt[2],
+                                               pf_change_32_unaligned(pd, 
&opt[2],
                                                    htonl(tsval +
                                                    src->scrub->pfss_ts_mod));
                                                copyback = 1;
@@ -1072,7 +1072,7 @@
                                            PFSS_TIMESTAMP)) {
                                                tsecr = ntohl(tsecr)
                                                    - dst->scrub->pfss_ts_mod;
-                                               pf_change_a(pd, &opt[6],
+                                               pf_change_32_unaligned(pd, 
&opt[6],
                                                    htonl(tsecr));
                                                copyback = 1;
                                        }

===============================================================================
reinstate pf_cksum_fixup sans nesting
--- pf.c.000    Thu Jun  4 13:22:27 2015
+++ pf.c        Thu Jun  4 13:39:02 2015
@@ -140,7 +140,14 @@
                            u_int32_t);
 void                    pf_add_threshold(struct pf_threshold *);
 int                     pf_check_threshold(struct pf_threshold *);
-
+void                    pf_cksum_fixup(u_int16_t *, u_int16_t, u_int16_t,
+                           u_int8_t);
+void                    pf_cksum_fixup_a(u_int16_t *, const struct pf_addr *,
+                           const struct pf_addr *, sa_family_t, u_int8_t);
+void                    pf_change_32(struct pf_pdesc *, u_int32_t *,
+                           u_int32_t);
+void                    pf_change_a(struct pf_pdesc *, struct pf_addr *,
+                           struct pf_addr *, sa_family_t, sa_family_t);
 void                    pf_change_ap(struct pf_pdesc *, struct pf_addr *,
                            u_int16_t *, struct pf_addr *, u_int16_t,
                            sa_family_t);
@@ -157,7 +164,7 @@
                            struct pf_pdesc *, struct pf_pdesc *,
                            struct pf_addr *, struct pf_addr *, sa_family_t,
                            sa_family_t);
-int                     pf_translate_icmp_af(int, void *);
+int                     pf_translate_icmp_af(struct pf_pdesc*, int, void *);
 void                    pf_send_tcp(const struct pf_rule *, sa_family_t,
                            const struct pf_addr *, const struct pf_addr *,
                            u_int16_t, u_int16_t, u_int32_t, u_int32_t,
@@ -290,6 +297,8 @@
        struct pf_state_key *);
 static __inline int pf_state_compare_id(struct pf_state *,
        struct pf_state *);
+static __inline void pf_cksum_uncover(u_int16_t *, u_int16_t, u_int8_t);
+static __inline void pf_cksum_cover(u_int16_t *, u_int16_t, u_int8_t);
 
 struct pf_src_tree tree_src_tracking;
 
@@ -1651,37 +1660,248 @@
        }
 }
 
+/* This function, given arguments of one endian, is invariant over the
+ * endian of the host. Why?
+ *
+ * Define the unary transpose operator ~ on a bitstring via python slice
+ * notation as lambda m: m[X:] + m[:X] , for some constant X
+ *
+ * Th. ~ distributes over ones-complement addition, denoted by +_1, i.e.
+ * 
+ *     ~m +_1 ~n  =  ~(m +_1 n)    (for all bitstrings m,n of equal length)
+ *
+ * Proof. Regard the bitstrings in m +_1 n as split at X, forming at
+ * most two 'half-adds'. Under ones-complement addition, each half-add
+ * carries to the other, so the sum of each half-add is unaffected by
+ * their relative order. Therefore:
+ *
+ *     ~m +_1 ~n
+ *   =    { half-adds invariant under transposition }
+ *     ~s
+ *   =    { substitute }
+ *     ~(m +_1 n)                   [end of proof]
+ *
+ * Th. Summing two in-memory ones-complement 16-bit variables m,n
+ * on a machine with the converse endian does not alter the result.
+ *
+ * Proof.
+ *        { converse machine endian: load/store transposes, X := 8 }
+ *     ~(~m +_1 ~n)
+ *   =    { ~ over +_1 }
+ *     ~~m +_1 ~~n
+ *   =    { ~ is an involution }
+ *      m +_1 n                     [end of proof]
+ */
+void 
+pf_cksum_fixup(u_int16_t *cksum, u_int16_t was, u_int16_t now,
+    u_int8_t proto)
+{
+       u_int32_t l;
+       const int udp = proto == IPPROTO_UDP;
+
+       if (udp && *cksum == 0x0000)
+               return; 
+
+       l = *cksum + was - now;
+       l = ((l >> 16) + (l & 0xffff))  &  0xffff; 
+
+       if (udp && l == 0x0000) 
+               l = 0xffff; 
+
+        *cksum = (u_int16_t)(l); 
+}
+
+/* pre: coverage(cksum) covers coverage(cksum_covered) */
+static __inline void
+pf_cksum_uncover(u_int16_t *cksum, u_int16_t cksum_covered, u_int8_t proto)
+{
+       pf_cksum_fixup(cksum, ~cksum_covered, 0x0, proto);
+}
+
+/* pre: disjoint(coverage(cksum), coverage(cksum_uncovered)) */
+static __inline void
+pf_cksum_cover(u_int16_t *cksum, u_int16_t cksum_uncovered, u_int8_t proto)
+{
+       pf_cksum_fixup(cksum, 0x0, ~cksum_uncovered, proto);
+}
+
+/* pre: changes are 16-bit aligned within the packet 
+ * 
+ * We emulate 16-bit ones-complement arithmetic by conserving its carries, 
+ * which twos-complement otherwise discards, in the upper 16 bits of l. 
+ * These accumulated carries when added to the lower 16-bits then 
+ * complete the ones-complement sum.
+ *
+ * Note, the accumulator, despite l being unsigned, supports net-negative
+ * carries:
+ *
+ * Arithmetic or assignment on n unsigned bits is modulo 2^n.
+ * Def. x mod y  =  x - (x//y)*y   for integer x,y 
+ *
+ * Th. (x + (y mod z)) mod z
+ *    =  { def mod }
+ *     (x + y - (y//z)*z) mod z
+ *    =  { (x + y*z) mod z = x mod z }
+ *     (x + y) mod z   (0)
+ *
+ * Now, the value of the unsigned m-bit accumulator having assigned 
+ * integer x to it is (x mod 2^m). Added to the sum, we have:
+ * 
+ *   (sum + (x mod 2^m)) mod 2^n
+ * =     { accumulator same width as sum; m = n }
+ *   (sum + (x mod 2^n)) mod 2^n
+ * =     { (0) }
+ *   (sum + x) mod 2^n
+ * 
+ * ... and when x < 0 this equals (sum - |x|) mod 2^n
+ *
+ * The scheme is therefore correct over a range of at least plus or
+ * minus 2^16 - 1 accumulated carries, afterwhich the accumulator
+ * wraps. This far exceeds the worst case below of plus or minus 8.
+ */
 void
+pf_cksum_fixup_a(u_int16_t *cksum, const struct pf_addr *a,
+    const struct pf_addr *an, sa_family_t af, u_int8_t proto)
+{
+       u_int32_t        l;
+       const u_int16_t *n = an->addr16;
+       const u_int16_t *o = a->addr16;
+       const int        udp = proto == IPPROTO_UDP;
+
+       switch (af) {
+       case AF_INET:
+               l = *cksum + o[0] - n[0] + o[1] - n[1];
+               break;
+#ifdef INET6
+       case AF_INET6:
+               l = *cksum + o[0] - n[0] + o[1] - n[1] + o[2] - n[2] + 
+                   o[3] - n[3] + o[4] - n[4] + o[5] - n[5] + o[6] -
+                   n[6] + o[7] - n[7];
+               break;
+#endif /* INET6 */
+       }
+
+       if (udp && *cksum == 0x0000)
+               return;
+
+       l = ((l >> 16) + (l & 0xffff))  &  0xffff;
+
+       if (udp && l == 0x0000)
+               l = 0xffff;
+
+       *cksum = (u_int16_t)(l);
+}
+
+void
+pf_change_8(struct pf_pdesc *pd, u_int8_t *f, u_int8_t v, bool hi)
+{
+       u_int16_t new = hi ? ( v << 8) :  v; 
+       u_int16_t old = hi ? (*f << 8) : *f; 
+
+       pf_cksum_fixup(pd->pcksum, htons(old), htons(new), pd->proto); 
+       *f = v; 
+}
+
+/* pre: *f is 16-bit aligned within its packet */
+void 
+pf_change_16(struct pf_pdesc *pd, u_int16_t *f, u_int16_t v)
+{
+       pf_cksum_fixup(pd->pcksum, *f, v, pd->proto); 
+       *f = v; 
+} 
+
+void
+pf_change_16_unaligned(struct pf_pdesc *pd, void *f, u_int16_t v, bool hi)
+{
+       u_int8_t *fb = (u_int8_t*)f; 
+       u_int8_t *vb = (u_int8_t*)&v;
+
+       if (hi && ALIGNED_POINTER(f, u_int16_t)) {
+               pf_change_16(pd, f, v); /* optimise */ 
+               return;
+       }
+
+       pf_change_8(pd, fb++, *vb++, hi);
+       pf_change_8(pd, fb++, *vb++,!hi);
+}
+
+/* pre: *f is 16-bit aligned within its packet */
+void
+pf_change_32(struct pf_pdesc *pd, u_int32_t *f, u_int32_t v)
+{
+       u_int16_t *pc = pd->pcksum;
+
+       pf_cksum_fixup(pc, *f / (1 << 16), v / (1 << 16), pd->proto);
+       pf_cksum_fixup(pc, *f % (1 << 16), v % (1 << 16), pd->proto);
+       *f = v;
+}
+
+/* pre: *a is 16-bit aligned within its packet */
+void
+pf_change_a(struct pf_pdesc *pd, struct pf_addr *a, struct pf_addr *an, 
+           sa_family_t af, sa_family_t naf)
+{
+       static struct pf_addr    zero;
+       u_int16_t               *pc = pd->pcksum; 
+
+       switch (pd->proto) {
+       case IPPROTO_TCP:       /* FALLTHROUGH */ 
+       case IPPROTO_UDP:       
+       case IPPROTO_ICMPV6: 
+               if (af == naf) {
+                       pf_cksum_fixup_a(pc, a, an, af, pd->proto);
+               }
+               else {
+                       /* "cksum_uncover_a(pc, a)" */
+                       /* "cksum_cover_a(pc, an)" */
+                       pf_cksum_fixup_a(pc, a, &zero, af, pd->proto);
+                       pf_cksum_fixup_a(pc, &zero, an, naf, pd->proto);
+               }
+               break; 
+
+       case IPPROTO_ICMP:      /* ICMPv4 has no pseudo-header */
+       default: 
+               break; 
+       }
+       
+       if (af == naf)
+               PF_ACPY(a, an, naf);
+}
+
+void
 pf_change_ap(struct pf_pdesc *pd, struct pf_addr *a, u_int16_t *p,
     struct pf_addr *an, u_int16_t pn, sa_family_t naf)
 {
-       if (pd->csum_status == PF_CSUM_UNKNOWN)
-               pf_check_proto_cksum(pd, pd->off, pd->tot_len - pd->off,
-                   pd->proto, pd->af);
-       if (pd->af == naf)
-               PF_ACPY(a, an, naf);
-       if (p != NULL)
+       if (p != NULL) {
+               pf_cksum_fixup(pd->pcksum, *p, pn, pd->proto);
                *p = pn;
+       }
+
+       pf_change_a(pd, a, an, pd->af, naf);
 }
 
-/* Changes a u_int32_t.  Uses a void * so there are no align restrictions */
 void
-pf_change_32_unaligned(struct pf_pdesc *pd, void *a, u_int32_t an)
+pf_change_32_unaligned(struct pf_pdesc *pd, void *f, u_int32_t v, bool hi)
 {
-       if (pd->csum_status == PF_CSUM_UNKNOWN)
-               pf_check_proto_cksum(pd, pd->off, pd->tot_len - pd->off,
-                   pd->proto, pd->af);
-       memcpy(a, &an, sizeof(u_int32_t));
+       u_int8_t *fb = (u_int8_t*)f; 
+       u_int8_t *vb = (u_int8_t*)&v; 
+
+       if (hi && ALIGNED_POINTER(f, u_int32_t)) {
+               pf_change_32(pd, f, v); /* optimise */ 
+               return;
+       }
+
+       pf_change_8(pd, fb++, *vb++, hi);
+       pf_change_8(pd, fb++, *vb++,!hi);
+       pf_change_8(pd, fb++, *vb++, hi);
+       pf_change_8(pd, fb++, *vb++,!hi);
 }
 
 #ifdef INET6
 void
 pf_change_a6(struct pf_pdesc *pd, struct pf_addr *a, struct pf_addr *an)
 {
-       if (pd->csum_status == PF_CSUM_UNKNOWN)
-               pf_check_proto_cksum(pd, pd->off, pd->tot_len - pd->off,
-                   pd->proto, pd->af);
-       PF_ACPY(a, an, AF_INET6);
+       pf_change_a(pd, a, an, AF_INET6, AF_INET6);
 }
 #endif /* INET6 */
 
@@ -1865,24 +2085,27 @@
        return (0);  /* These types match to their own state */
 }
 
+/* pf_change_ap_icmp, allow for af/ ? */ 
 void
 pf_change_icmp(struct pf_pdesc *pd, struct pf_addr *ia, u_int16_t *ip,
     struct pf_addr *oa, struct pf_addr *na, u_int16_t np, sa_family_t af)
 {
-       if (pd->csum_status == PF_CSUM_UNKNOWN)
-               pf_check_proto_cksum(pd, pd->off, pd->tot_len - pd->off,
-                   pd->proto, pd->af);
+        /* note: doesn't trouble to fixup quoted checksums, if any */ 
 
-       /* Change inner protocol port */
-       if (ip != NULL)
+       /* change quoted protocol port */
+       if (ip != NULL) {
+               pf_cksum_fixup(pd->pcksum, *ip, np, pd->proto);
                *ip = np;
+       }
 
-       /* Change inner ip address */
+       /* change quoted ip address */
+       pf_cksum_fixup_a(pd->pcksum, ia, na, af, pd->proto);
        PF_ACPY(ia, na, af);
 
-       /* Outer ip address, fix outer icmpv6 checksum, if necessary. */
-       if (oa)
-               PF_ACPY(oa, na, af);
+       /* change outer ip address */
+       if (oa) {
+               pf_change_a(pd, oa, na, af, af);
+       }
 }
 
 #if INET && INET6
@@ -1895,10 +2118,6 @@
        struct icmp6_hdr        *icmp;
        int                      hlen;
 
-       if (pd->csum_status == PF_CSUM_UNKNOWN)
-               pf_check_proto_cksum(pd, pd->off, pd->tot_len - pd->off,
-                   pd->proto, pd->af);
-
        hlen = pd->naf == AF_INET ? sizeof(*ip4) : sizeof(*ip6);
 
        /* trim the old header */
@@ -1970,31 +2189,27 @@
        struct ip6_hdr          *ip6;
        int                      hlen, olen, mlen;
 
-       if (pd->csum_status == PF_CSUM_UNKNOWN)
-               pf_check_proto_cksum(pd, pd->off, pd->tot_len - pd->off,
-                   pd->proto, pd->af);
-
        if (af == naf || (af != AF_INET && af != AF_INET6) ||
            (naf != AF_INET && naf != AF_INET6))
                return (-1);
 
-       /* split the mbuf chain on the inner ip/ip6 header boundary */
+       /* split the mbuf chain on the quoted ip/ip6 header boundary */
        if ((n = m_split(m, off, M_DONTWAIT)) == NULL)
                return (-1);
 
-       /* old header */
+       /* old quoted header */
        olen = pd2->off - off;
-       /* new header */
+       /* new quoted header */
        hlen = naf == AF_INET ? sizeof(*ip4) : sizeof(*ip6);
 
-       /* trim old header */
+       /* trim old quoted header */
+        pf_cksum_uncover(pd->pcksum, in_cksum(n, olen << 2), pd->proto);
        m_adj(n, olen);
 
-       /* prepend a new one */
+       /* prepend a new, translated, quoted header */
        if ((M_PREPEND(n, hlen, M_DONTWAIT)) == NULL)
                return (-1);
 
-       /* translate inner ip/ip6 header */
        switch (naf) {
        case AF_INET:
                ip4 = mtod(n, struct ip *);
@@ -2035,11 +2250,16 @@
        pd2->off += hlen - olen;
        pd->tot_len += hlen - olen;
 
-       /* merge modified inner packet with the original header */
+       /* reattach modified quoted packet to outer header */
        mlen = n->m_pkthdr.len;
        m_cat(m, n);
        m->m_pkthdr.len += mlen;
 
+        /* optimise: any new AF_INET header of ours sums to zero */
+        if (naf != AF_INET) {
+            pf_cksum_cover(pd->pcksum, in_cksum(n, hlen << 2), pd->proto);
+        }
+
        return (0);
 }
 
@@ -2048,7 +2268,7 @@
 #define PTR_IP6(field) (offsetof(struct ip6_hdr, field))
 
 int
-pf_translate_icmp_af(int af, void *arg)
+pf_translate_icmp_af(struct pf_pdesc *pd, int af, void *arg)
 {
        struct icmp             *icmp4;
        struct icmp6_hdr        *icmp6;
@@ -2136,13 +2356,16 @@
                default:
                        return (-1);
                }
-               icmp6->icmp6_type = type;
-               icmp6->icmp6_code = code;
+
+               pf_change_8(pd, &icmp6->icmp6_type, type, PF_HI); 
+               pf_change_8(pd, &icmp6->icmp6_code, code, PF_LO); 
+
                /* aligns well with a icmpv4 nextmtu */
-               icmp6->icmp6_mtu = htonl(mtu);
+               pf_change_32(pd, &icmp6->icmp6_mtu, htonl(mtu));
+
                /* icmpv4 pptr is a one most significant byte */
                if (ptr >= 0)
-                       icmp6->icmp6_pptr = htonl(ptr << 24);
+                       pf_change_32(pd, &icmp6->icmp6_pptr, htonl(ptr << 24));
                break;
        case AF_INET6:
                icmp4 = arg;
@@ -2231,11 +2454,12 @@
                default:
                        return (-1);
                }
-               icmp4->icmp_type = type;
-               icmp4->icmp_code = code;
-               icmp4->icmp_nextmtu = htons(mtu);
-               if (ptr >= 0)
-                       icmp4->icmp_void = htonl(ptr);
+
+               pf_change_8(pd, &icmp4->icmp_type, type, PF_HI); 
+               pf_change_8(pd, &icmp4->icmp_code, code, PF_LO); 
+               pf_change_16(pd, &icmp4->icmp_nextmtu, htons(mtu));
+               if (ptr >= 0) /* => ICMP_PARAMPROB */
+                       pf_change_32(pd, &icmp4->icmp_void, htonl(ptr));
                break;
        }
 
@@ -2253,7 +2477,7 @@
        struct tcphdr   *th = pd->hdr.tcp;
        int              hlen = (th->th_off << 2) - sizeof(*th);
        int              thoptlen = hlen;
-       u_int8_t         opts[MAX_TCPOPTLEN], *opt = opts;
+       u_int8_t         opts[MAX_TCPOPTLEN], *opt = opts; 
        int              copyback = 0, i, olen;
        struct sackblk   sack;
 
@@ -2276,13 +2500,17 @@
                        if (olen >= TCPOLEN_SACKLEN) {
                                for (i = 2; i + TCPOLEN_SACK <= olen;
                                    i += TCPOLEN_SACK) {
+                                       size_t startoff = (opt + i) - opts; 
                                        memcpy(&sack, &opt[i], sizeof(sack));
                                        pf_change_32_unaligned(pd, &sack.start,
                                            htonl(ntohl(sack.start) -
-                                           dst->seqdiff));
+                                               dst->seqdiff),
+                                           PF_ALGNMNT(startoff));
                                        pf_change_32_unaligned(pd, &sack.end,
                                            htonl(ntohl(sack.end) -
-                                           dst->seqdiff));
+                                               dst->seqdiff),
+                                           PF_ALGNMNT(startoff +
+                                               sizeof(sack.start)));
                                        memcpy(&opt[i], &sack, sizeof(sack));
                                }
                                copyback = 1;
@@ -3408,7 +3636,6 @@
 
        /* copy back packet headers if needed */
        if (rewrite && pd->hdrlen) {
-               pf_cksum(pd, pd->m);
                m_copyback(pd->m, pd->off, pd->hdrlen, pd->hdr.any, M_NOWAIT);
        }
 
@@ -3500,7 +3727,7 @@
                            0)
                                s->src.seqdiff = 1;
                        pf_change_32_unaligned(pd, &th->th_seq,
-                           htonl(s->src.seqlo + s->src.seqdiff));
+                           htonl(s->src.seqlo + s->src.seqdiff), PF_HI);
                        *rewrite = 1;
                } else
                        s->src.seqdiff = 0;
@@ -3691,20 +3918,20 @@
 
                if (afto) {
 #ifdef INET6
-                       if (pf_translate_icmp_af(AF_INET6, pd->hdr.icmp))
+                       if (pf_translate_icmp_af(pd, AF_INET6, pd->hdr.icmp))
                                return (0);
                        pd->proto = IPPROTO_ICMPV6;
                        rewrite = 1;
 #endif /* INET6 */
                } else {
                        if (PF_ANEQ(saddr, pd->src, pd->af)) {
-                               pf_change_32_unaligned(pd, &pd->src->v4.s_addr,
-                                   saddr->v4.s_addr);
+                               pf_change_a(pd, pd->src, saddr,
+                                   pd->af, pd->af);
                                rewrite = 1;
                        }
                        if (PF_ANEQ(daddr, pd->dst, pd->af)) {
-                               pf_change_32_unaligned(pd, &pd->dst->v4.s_addr,
-                                   daddr->v4.s_addr);
+                               pf_change_a(pd, pd->dst, daddr,
+                                   pd->af, pd->af);
                                rewrite = 1;
                        }
                }
@@ -3712,11 +3939,8 @@
                        u_int16_t icmpid = (icmp_dir == PF_IN) ? sport : dport;
 
                        if (icmpid != pd->hdr.icmp->icmp_id) {
-                               if (pd->csum_status == PF_CSUM_UNKNOWN)
-                                       pf_check_proto_cksum(pd, pd->off,
-                                           pd->tot_len - pd->off, pd->proto,
-                                           pd->af);
-                               pd->hdr.icmp->icmp_id = icmpid;
+                               pf_change_16(pd, 
+                                   &pd->hdr.icmp->icmp_id, icmpid);
                                rewrite = 1;
                        }
                }
@@ -3730,7 +3954,7 @@
 
                if (afto) {
                        /* ip_sum will be recalculated in pf_translate_af */
-                       if (pf_translate_icmp_af(AF_INET, pd->hdr.icmp6))
+                       if (pf_translate_icmp_af(pd, AF_INET, pd->hdr.icmp6))
                                return (0);
                        pd->proto = IPPROTO_ICMP;
                        rewrite = 1;
@@ -3748,11 +3972,8 @@
                        u_int16_t icmpid = (icmp_dir == PF_IN) ? sport : dport;
 
                        if (icmpid != pd->hdr.icmp6->icmp6_id) {
-                               if (pd->csum_status == PF_CSUM_UNKNOWN)
-                                       pf_check_proto_cksum(pd, pd->off,
-                                           pd->tot_len - pd->off, pd->proto,
-                                           pd->af);
-                               pd->hdr.icmp6->icmp6_id = icmpid;
+                               pf_change_16(pd, 
+                                   &pd->hdr.icmp6->icmp6_id, icmpid);
                                rewrite = 1;
                        }
                }
@@ -3763,13 +3984,13 @@
                switch (pd->af) {
                case AF_INET:
                        if (!afto && PF_ANEQ(saddr, pd->src, pd->af)) {
-                               pf_change_32_unaligned(pd, &pd->src->v4.s_addr,
-                                   saddr->v4.s_addr);
+                               pf_change_a(pd, pd->src, saddr,
+                                   pd->af, pd->af);
                                rewrite = 1;
                        }
                        if (!afto && PF_ANEQ(daddr, pd->dst, pd->af)) {
-                               pf_change_32_unaligned(pd, &pd->dst->v4.s_addr,
-                                   daddr->v4.s_addr);
+                               pf_change_a(pd, pd->dst, daddr, 
+                                   pd->af, pd->af);
                                rewrite = 1;
                        }
                        break;
@@ -3831,8 +4052,9 @@
                        while ((src->seqdiff = arc4random() - seq) == 0)
                                ;
                        ack = ntohl(th->th_ack) - dst->seqdiff;
-                       pf_change_32_unaligned(pd, &th->th_seq, htonl(seq + 
src->seqdiff));
-                       pf_change_32_unaligned(pd, &th->th_ack, htonl(ack));
+                       pf_change_32_unaligned(pd, &th->th_seq, htonl(seq +
+                                                  src->seqdiff), PF_HI);
+                       pf_change_32_unaligned(pd, &th->th_ack, htonl(ack), 
PF_HI);
                        *copyback = 1;
                } else {
                        ack = ntohl(th->th_ack);
@@ -3883,8 +4105,9 @@
                ack = ntohl(th->th_ack) - dst->seqdiff;
                if (src->seqdiff) {
                        /* Modulate sequence numbers */
-                       pf_change_32_unaligned(pd, &th->th_seq, htonl(seq + 
src->seqdiff));
-                       pf_change_32_unaligned(pd, &th->th_ack, htonl(ack));
+                       pf_change_32_unaligned(pd, &th->th_seq, htonl(seq +
+                                                  src->seqdiff), PF_HI);
+                       pf_change_32_unaligned(pd, &th->th_ack, htonl(ack), 
PF_HI);
                        *copyback = 1;
                }
                end = seq + pd->p_len;
@@ -4419,7 +4642,6 @@
        }
 
        if (copyback && pd->hdrlen > 0) {
-               pf_cksum(pd, pd->m);
                m_copyback(pd->m, pd->off, pd->hdrlen, pd->hdr.any, M_NOWAIT);
        }
 
@@ -4542,7 +4764,7 @@
                        case AF_INET:
 #ifdef INET6
                                if (afto) {
-                                       if (pf_translate_icmp_af(AF_INET6,
+                                       if (pf_translate_icmp_af(pd, AF_INET6,
                                            pd->hdr.icmp))
                                                return (PF_DROP);
                                        pd->proto = IPPROTO_ICMPV6;
@@ -4550,22 +4772,19 @@
 #endif /* INET6 */
                                if (!afto && PF_ANEQ(pd->src,
                                    &nk->addr[sidx], AF_INET))
-                                       pf_change_32_unaligned(pd, 
&saddr->v4.s_addr,
-                                           nk->addr[sidx].v4.s_addr);
+                                       pf_change_a(pd, saddr, &nk->addr[sidx],
+                                           pd->af, pd->af);
 
                                if (!afto && PF_ANEQ(pd->dst,
                                    &nk->addr[didx], AF_INET)) {
-                                       pf_change_32_unaligned(pd, 
&daddr->v4.s_addr,
-                                           nk->addr[didx].v4.s_addr);
+                                       pf_change_a(pd, daddr, &nk->addr[didx],
+                                           pd->af, pd->af);
                                        pd->destchg = 1;
                                }
 
                                if (nk->port[iidx] !=  pd->hdr.icmp->icmp_id) {
-                                       if (pd->csum_status == PF_CSUM_UNKNOWN)
-                                               pf_check_proto_cksum(pd,
-                                                   pd->off, pd->tot_len -
-                                                   pd->off, pd->proto, pd->af);
-                                       pd->hdr.icmp->icmp_id = nk->port[iidx];
+                                       pf_change_16(pd, &pd->hdr.icmp->icmp_id,
+                                           nk->port[iidx]);
                                }
 
                                m_copyback(pd->m, pd->off, ICMP_MINLEN,
@@ -4575,7 +4794,7 @@
 #ifdef INET6
                        case AF_INET6:
                                if (afto) {
-                                       if (pf_translate_icmp_af(AF_INET,
+                                       if (pf_translate_icmp_af(pd, AF_INET,
                                            pd->hdr.icmp6))
                                                return (PF_DROP);
                                        pd->proto = IPPROTO_ICMP;
@@ -4593,12 +4812,9 @@
                                }
 
                                if (nk->port[iidx] != pd->hdr.icmp6->icmp6_id) {
-                                       if (pd->csum_status == PF_CSUM_UNKNOWN)
-                                               pf_check_proto_cksum(pd,
-                                                   pd->off, pd->tot_len -
-                                                   pd->off, pd->proto, pd->af);
-                                       pd->hdr.icmp6->icmp6_id =
-                                           nk->port[iidx];
+                                       pf_change_16(pd, 
+                                           &pd->hdr.icmp6->icmp6_id, 
+                                           nk->port[iidx]);
                                }
 
                                m_copyback(pd->m, pd->off,
@@ -4747,7 +4963,8 @@
                        /* Demodulate sequence number */
                        seq = ntohl(th.th_seq) - src->seqdiff;
                        if (src->seqdiff) {
-                               pf_change_32_unaligned(pd, &th.th_seq, 
htonl(seq));
+                               pf_change_32_unaligned(pd, &th.th_seq,
+                                   htonl(seq), PF_HI);
                                copyback = 1;
                        }
 
@@ -4798,7 +5015,7 @@
 
 #if INET && INET6
                                if (afto) {
-                                       if (pf_translate_icmp_af(nk->af,
+                                       if (pf_translate_icmp_af(pd, nk->af,
                                            pd->hdr.icmp))
                                                return (PF_DROP);
                                        m_copyback(pd->m, pd->off,
@@ -4913,7 +5130,7 @@
 
 #if INET && INET6
                                if (afto) {
-                                       if (pf_translate_icmp_af(nk->af,
+                                       if (pf_translate_icmp_af(pd, nk->af,
                                            pd->hdr.icmp))
                                                return (PF_DROP);
                                        m_copyback(pd->m, pd->off,
@@ -4985,7 +5202,10 @@
                                        break;
 #endif /* INET6 */
                                }
-                               uh.uh_sum = 0;
+                               /* Avoid recomputing quoted UDP checksum.
+                                * note: udp6 0 csum invalid per rfc2460 p27. 
+                                * but presumed nothing cares in this context 
*/ 
+                               pf_change_16(pd, &uh.uh_sum, 0);
                                m_copyback(pd2.m, pd2.off, sizeof(uh), &uh,
                                    M_NOWAIT);
                                copyback = 1;
@@ -5036,7 +5256,7 @@
                                if (afto) {
                                        if (nk->af != AF_INET6)
                                                return (PF_DROP);
-                                       if (pf_translate_icmp_af(nk->af,
+                                       if (pf_translate_icmp_af(pd, nk->af,
                                            pd->hdr.icmp))
                                                return (PF_DROP);
                                        m_copyback(pd->m, pd->off,
@@ -5047,7 +5267,8 @@
                                            &nk->addr[didx], pd->af, nk->af))
                                                return (PF_DROP);
                                        pd->proto = IPPROTO_ICMPV6;
-                                       if (pf_translate_icmp_af(nk->af, &iih))
+                                       if (pf_translate_icmp_af(pd,
+                                               nk->af, &iih))
                                                return (PF_DROP);
                                        if (virtual_type == htons(ICMP_ECHO) &&
                                            nk->port[iidx] != iih.icmp_id)
@@ -5147,7 +5368,7 @@
                                if (afto) {
                                        if (nk->af != AF_INET)
                                                return (PF_DROP);
-                                       if (pf_translate_icmp_af(nk->af,
+                                       if (pf_translate_icmp_af(pd, nk->af,
                                            pd->hdr.icmp))
                                                return (PF_DROP);
                                        m_copyback(pd->m, pd->off,
@@ -5158,7 +5379,7 @@
                                            &nk->addr[didx], pd->af, nk->af))
                                                return (PF_DROP);
                                        pd->proto = IPPROTO_ICMP;
-                                       if (pf_translate_icmp_af(nk->af, &iih))
+                                       if (pf_translate_icmp_af(pd, nk->af, 
&iih))
                                                return (PF_DROP);
                                        if (virtual_type ==
                                            htons(ICMP6_ECHO_REQUEST) &&
@@ -5272,7 +5493,6 @@
                }
        }
        if (copyback) {
-               pf_cksum(pd, pd->m);
                m_copyback(pd->m, pd->off, pd->hdrlen, pd->hdr.any, M_NOWAIT);
        }
 
@@ -6215,6 +6435,7 @@
                        REASON_SET(reason, PFRES_SHORT);
                        return (PF_DROP);
                }
+               pd->pcksum = &pd->hdr.icmp6->icmp6_cksum;
                break;
        }
 #endif /* INET6 */
@@ -6627,37 +6848,6 @@
        }
 
        return (action);
-}
-
-void
-pf_cksum(struct pf_pdesc *pd, struct mbuf *m)
-{
-       if (pd->csum_status != PF_CSUM_OK)
-               return; /* don't fix broken cksums */
-
-       switch (pd->proto) {
-       case IPPROTO_TCP:
-               pd->hdr.tcp->th_sum = 0;
-               m->m_pkthdr.csum_flags |= M_TCP_CSUM_OUT;
-               break;
-       case IPPROTO_UDP:
-               pd->hdr.udp->uh_sum = 0;
-               m->m_pkthdr.csum_flags |= M_UDP_CSUM_OUT;
-               break;
-       case IPPROTO_ICMP:
-               pd->hdr.icmp->icmp_cksum = 0;
-               m->m_pkthdr.csum_flags |= M_ICMP_CSUM_OUT;
-               break;
-#ifdef INET6
-       case IPPROTO_ICMPV6:
-               pd->hdr.icmp6->icmp6_cksum = 0;
-               m->m_pkthdr.csum_flags |= M_ICMP_CSUM_OUT;
-               break;
-#endif /* INET6 */
-       default:
-               /* nothing */
-               break;
-       }
 }
 
 /*
--- pfvar.h.000 Wed Jun  3 09:15:16 2015
+++ pfvar.h     Wed Jun  3 22:24:38 2015
@@ -1813,7 +1813,13 @@
 
 void   *pf_pull_hdr(struct mbuf *, int, void *, int, u_short *, u_short *,
            sa_family_t);
-void   pf_change_32_unaligned(struct pf_pdesc *, void *, u_int32_t);
+#define PF_HI (true)
+#define PF_LO (!PF_HI)
+#define PF_ALGNMNT(off) (((off) % 2) == 0 ? PF_HI : PF_LO)
+void    pf_change_8(struct pf_pdesc *, u_int8_t *, u_int8_t, bool);
+void    pf_change_16(struct pf_pdesc *, u_int16_t *, u_int16_t);
+void    pf_change_16_unaligned(struct pf_pdesc *, void *, u_int16_t, bool);
+void   pf_change_32_unaligned(struct pf_pdesc *, void *, u_int32_t, bool);
 int    pf_check_proto_cksum(struct pf_pdesc *, int, int, u_int8_t,
            sa_family_t);
 int    pflog_packet(struct pf_pdesc *, u_int8_t, struct pf_rule *,
@@ -1998,8 +2004,6 @@
                            struct pf_pool *, enum pf_sn_types);
 
 int                     pf_postprocess_addr(struct pf_state *);
-
-void                    pf_cksum(struct pf_pdesc *, struct mbuf *);
 
 #endif /* _KERNEL */
 
--- pf_norm.c.000       Wed Jun  3 09:15:16 2015
+++ pf_norm.c   Wed Jun  3 13:40:32 2015
@@ -832,10 +832,6 @@
        u_int8_t         flags;
        u_int            rewrite = 0;
 
-       if (pd->csum_status == PF_CSUM_UNKNOWN)
-               pf_check_proto_cksum(pd, pd->off, pd->tot_len - pd->off,
-                   pd->proto, pd->af);
-
        flags = th->th_flags;
        if (flags & TH_SYN) {
                /* Illegal packet */
@@ -857,21 +853,23 @@
        }
 
        /* If flags changed, or reserved data set, then adjust */
-       if (flags != th->th_flags || th->th_x2 != 0) {
-               th->th_flags = flags;
-               th->th_x2 = 0;
-               rewrite = 1;
-       }
+       if (flags != th->th_flags || th->th_x2 != 0) {
+               /* hack: set 4-bit th_x2 = 0 */
+               u_int8_t *th_off = (u_int8_t*)(&th->th_ack+1);
+               pf_change_8(pd, th_off, th->th_off << 4, PF_HI);
 
+               pf_change_8(pd, &th->th_flags, flags, PF_LO);
+               rewrite = 1;
+       }
+
        /* Remove urgent pointer, if TH_URG is not set */
        if (!(flags & TH_URG) && th->th_urp) {
-               th->th_urp = 0;
+               pf_change_16(pd, &th->th_urp, 0);
                rewrite = 1;
        }
 
        /* copy back packet headers if we sanitized */
        if (rewrite) {
-               pf_cksum(pd, pd->m);
                m_copyback(pd->m, pd->off, sizeof(*th), th, M_NOWAIT);
        }
 
@@ -985,7 +983,7 @@
        u_int32_t        tsval, tsecr;
        u_int            tsval_from_last;
        u_int8_t         hdr[60];
-       u_int8_t        *opt;
+       u_int8_t        *opts, *opt;
        int              copyback = 0;
        int              got_ts = 0;
 
@@ -1024,7 +1022,7 @@
            pd->af)) {
                /* Diddle with TCP options */
                int hlen;
-               opt = hdr + sizeof(struct tcphdr);
+               opt = opts = hdr + sizeof(struct tcphdr);
                hlen = (th->th_off << 2) - sizeof(struct tcphdr);
                while (hlen >= TCPOLEN_TIMESTAMP) {
                        switch (*opt) {
@@ -1052,28 +1050,37 @@
                                        return (PF_DROP);
                                }
                                if (opt[1] >= TCPOLEN_TIMESTAMP) {
-                                       memcpy(&tsval, &opt[2],
-                                           sizeof(u_int32_t));
+                                       u_int8_t *ts, *tsr; 
+                                       ts = opt + 2; 
+                                       memcpy(&tsval, ts, sizeof(u_int32_t));
+
+                                       /* modulate TS */
                                        if (tsval && src->scrub &&
                                            (src->scrub->pfss_flags &
                                            PFSS_TIMESTAMP)) {
+                                               /* note: tsval used later on */
                                                tsval = ntohl(tsval);
-                                               pf_change_32_unaligned(pd, 
&opt[2],
+                                               pf_change_32_unaligned(pd, ts,
                                                    htonl(tsval +
-                                                   src->scrub->pfss_ts_mod));
+                                                       
src->scrub->pfss_ts_mod), 
+                                                   PF_ALGNMNT(ts - opts));
                                                copyback = 1;
                                        }
 
-                                       /* Modulate TS reply iff valid (!0) */
-                                       memcpy(&tsecr, &opt[6],
+                                       /* modulate TS reply if any */
+                                       tsr = opt + 6;
+                                       memcpy(&tsecr, tsr,
                                            sizeof(u_int32_t));
+
                                        if (tsecr && dst->scrub &&
                                            (dst->scrub->pfss_flags &
                                            PFSS_TIMESTAMP)) {
+                                               /* note: tsecr used later on */
                                                tsecr = ntohl(tsecr)
                                                    - dst->scrub->pfss_ts_mod;
-                                               pf_change_32_unaligned(pd, 
&opt[6],
-                                                   htonl(tsecr));
+                                               pf_change_32_unaligned(pd, tsr,
+                                                   htonl(tsecr), 
+                                                   PF_ALGNMNT(tsr - opts));
                                                copyback = 1;
                                        }
                                        got_ts = 1;
@@ -1366,13 +1373,9 @@
        u_int16_t        mss;
        int              thoff;
        int              opt, cnt, optlen = 0;
-       u_char           opts[MAX_TCPOPTLEN];
-       u_char          *optp = opts;
+       u_int8_t         opts[MAX_TCPOPTLEN];
+       u_int8_t        *optp = opts;
 
-       if (pd->csum_status == PF_CSUM_UNKNOWN)
-               pf_check_proto_cksum(pd, pd->off, pd->tot_len - pd->off,
-                   pd->proto, pd->af);
-
        thoff = th->th_off << 2;
        cnt = thoff - sizeof(struct tcphdr);
 
@@ -1394,15 +1397,18 @@
                                break;
                }
                if (opt == TCPOPT_MAXSEG) {
-                       memcpy(&mss, (optp + 2), 2);
+                       u_int8_t *mssp = optp + 2;
+                       memcpy(&mss, mssp, sizeof(mss));
                        if (ntohs(mss) > maxmss) {
-                               mss = htons(maxmss);
+                               size_t mssoptsoff = mssp - opts;
+                               pf_change_16_unaligned(pd, &mss,
+                                   htons(maxmss), PF_ALGNMNT(mssoptsoff));
                                m_copyback(pd->m,
-                                   pd->off + sizeof(*th) + optp + 2 - opts,
-                                   2, &mss, M_NOWAIT);
-                               pf_cksum(pd, pd->m);
-                               m_copyback(pd->m, pd->off, sizeof(*th), th,
-                                   M_NOWAIT);
+                                   pd->off + sizeof(*th) + mssoptsoff,
+                                   sizeof(mss), &mss, M_NOWAIT);
+                               m_copyback(pd->m,
+                                   pd->off,
+                                   sizeof(*th), th, M_NOWAIT);
                        }
                }
        }

===============================================================================
avoid unnecessary calls to pf_change_32_unaligned
--- pf.c.001    Thu Jun  4 13:45:04 2015
+++ pf.c        Thu Jun  4 13:45:38 2015
@@ -3726,8 +3726,8 @@
                        if ((s->src.seqdiff = pf_tcp_iss(pd) - s->src.seqlo) ==
                            0)
                                s->src.seqdiff = 1;
-                       pf_change_32_unaligned(pd, &th->th_seq,
-                           htonl(s->src.seqlo + s->src.seqdiff), PF_HI);
+                       pf_change_32(pd, &th->th_seq, 
+                                     htonl(s->src.seqlo + s->src.seqdiff));
                        *rewrite = 1;
                } else
                        s->src.seqdiff = 0;
@@ -4052,9 +4052,8 @@
                        while ((src->seqdiff = arc4random() - seq) == 0)
                                ;
                        ack = ntohl(th->th_ack) - dst->seqdiff;
-                       pf_change_32_unaligned(pd, &th->th_seq, htonl(seq +
-                                                  src->seqdiff), PF_HI);
-                       pf_change_32_unaligned(pd, &th->th_ack, htonl(ack), 
PF_HI);
+                       pf_change_32(pd, &th->th_seq, htonl(seq + 
src->seqdiff));
+                       pf_change_32(pd, &th->th_ack, htonl(ack));
                        *copyback = 1;
                } else {
                        ack = ntohl(th->th_ack);
@@ -4105,9 +4104,10 @@
                ack = ntohl(th->th_ack) - dst->seqdiff;
                if (src->seqdiff) {
                        /* Modulate sequence numbers */
-                       pf_change_32_unaligned(pd, &th->th_seq, htonl(seq +
-                                                  src->seqdiff), PF_HI);
-                       pf_change_32_unaligned(pd, &th->th_ack, htonl(ack), 
PF_HI);
+                       pf_change_32(pd, &th->th_seq, 
+                           htonl(seq + src->seqdiff));
+                       pf_change_32(pd, &th->th_ack, 
+                           htonl(ack));
                        *copyback = 1;
                }
                end = seq + pd->p_len;
@@ -4963,8 +4963,7 @@
                        /* Demodulate sequence number */
                        seq = ntohl(th.th_seq) - src->seqdiff;
                        if (src->seqdiff) {
-                               pf_change_32_unaligned(pd, &th.th_seq,
-                                   htonl(seq), PF_HI);
+                               pf_change_32(pd, &th.th_seq, htonl(seq));
                                copyback = 1;
                        }
 







Reply via email to