Re: [External] : pf route-to: only run pf_test when packets enter and leave the stack

2021-02-03 Thread Alexandr Nedvedicky
Hello,


>   pass in on em0 from v.x.y.z/n to a.b.c.d/m \
>   route-to o.p.q.r nat-to (em2)
> 
> > then this needs to be converted to two rules:
> > 
> > match in on em0 from v.x.y.z/n to a.b.c.d/m nat-to(em2)
> > pass in on em0 from v.x.y.z/n to a.b.c.d/m route-to o.p.q.r
> > 
> > I have not tried that yet. However I think this should work. If it does
> > not work, then I'll try to fix it.
> 
> I thought the problem was for rules like this:
> 
>   pass out on em1 from v.x.y.z/n to a.b.c.d/m \
>   route-to o.p.q.r@em2
>   pass out on em2 nat-to (em2)
> 

correct, combination of NAT and route-to is problem
on outbound rules. I failed to type my example right.

> Only one pass out rule will win if I commit this, because the packet
> will only go through the ruleset when it leaves the stack, not every
> time the interface changes. If we can do match route-to rules, we could
> do the following:
> 
>   match out on em1 from v.x.y.z/n to a.b.c.d/m \
>   route-to o.p.q.r # o.p.q.r is reachable via em2
>   pass out on em2 nat-to (em2) 
> 

my idea is to fix combined outbound rule with pair of rules. So let
me retry. Let there be an outbound rule:

pass out on em1 from v.x.y.z/n to a.b.c.d/m \
route-to o.p.q.r@em2 nat-to(em2)

the 'route-to o.p.q.r@em2' will get changed to 'route-to o.p.q.r',
assuming the o.p.q.r is reachable via em2. The nat-to action
will be moved to 'match rule'. The rule above needs to be
changed to pair of rules:

match out on em1 from v.x.y.z/n to a.b.c.d/m nat-to(em2)
pass out on em1 from v.x.y.z/n to a.b.c.d/m \
route-to o.p.q.r # o.p.q.r is reachable via em2

I believe this is something what should work already.


thanks and
regards
sashan



Re: [External] : pf route-to: only run pf_test when packets enter and leave the stack

2021-02-02 Thread Alexandr Nedvedicky
Hello,


On Tue, Feb 02, 2021 at 02:52:52PM +1000, David Gwynne wrote:
> 
> however, like most things relating to route-to/reply-to/dup-to, im
> pretty sure at this point it's not used a lot, so the impact is minimal.
> a lot of changes in this space have already been made, so adding another
> simplification is justifiable. if this does remove functionality that
> people need, i believe sashan@ has agreed to help me implement route-to
> on match rules to give more flexibility and composability of rules.
> 

as David says my concern is single corner case, which combines
NAT with route-to action. I think the escape plan for people,
who combine route-to with nat-to, is already there. If someone
has rule as follows:

pass in on em0 from v.x.y.z/n to a.b.c.d/m \
route-to o.p.q.r@em2 nat-to(em2)

then this needs to be converted to two rules:

match in on em0 from v.x.y.z/n to a.b.c.d/m nat-to(em2)
pass in on em0 from v.x.y.z/n to a.b.c.d/m route-to o.p.q.r

I have not tried that yet. However I think this should work. If it does
not work, then I'll try to fix it.

> i've canvassed a few people, and their responses have varied from "i
> don't care, route-to is the worst" to "i thought we did option 2
> anyway". anyone else want to chime in?
> 
> this keeps the behaviour where route-to on a packet coming into the
> stack is pushed past it and immediately forwarded to the output
> interface. the condition for that is greatly simplified now though.
> 
> ok?

given there is an escape plan, I'm fine with the change.

OK sashan

> 
> Index: pf.c
> ===
> RCS file: /cvs/src/sys/net/pf.c,v
> retrieving revision 1.1106
> diff -u -p -r1.1106 pf.c
> --- pf.c  1 Feb 2021 00:31:05 -   1.1106
> +++ pf.c  2 Feb 2021 03:44:51 -
> @@ -6033,7 +6033,7 @@ pf_route(struct pf_pdesc *pd, struct pf_
>   (ifp->if_flags & IFF_LOOPBACK) == 0)
>   ip->ip_src = ifatoia(rt->rt_ifa)->ia_addr.sin_addr;
>  
> - if (s->rt != PF_DUPTO && pd->kif->pfik_ifp != ifp) {
> + if (s->rt != PF_DUPTO && pd->dir == PF_IN) {
>   if (pf_test(AF_INET, PF_OUT, ifp, ) != PF_PASS)
>   goto bad;
>   else if (m0 == NULL)
> @@ -6178,7 +6178,7 @@ pf_route6(struct pf_pdesc *pd, struct pf
>   (ifp->if_flags & IFF_LOOPBACK) == 0)
>   ip6->ip6_src = ifatoia6(rt->rt_ifa)->ia_addr.sin6_addr;
>  
> - if (s->rt != PF_DUPTO && pd->kif->pfik_ifp != ifp) {
> + if (s->rt != PF_DUPTO && pd->dir == PF_IN) {
>   if (pf_test(AF_INET6, PF_OUT, ifp, ) != PF_PASS)
>   goto bad;
>   else if (m0 == NULL)
> 
> 



Re: [External] : pf: route-to IPs, not interfaces

2021-01-28 Thread Alexandr Nedvedicky
Hello David,

thanks for nice wrap up of the story...


> 
> this change does the following:
> 
> - stores the route info in the state instead of the pf rule
> 
>   this allows route-to to keep working when the ruleset changes, and
>   allows route-to info to be sent over pfsync. there's enough spare bits
>   in pfsync messages that the protocol doesnt break.
> 
>   the caveat is that route-to becomes tied to pass rules that create
>   state, like rdr-to and nat-to.
> 
> - the argument to route-to etc is a destination ip address
> 
>   it's not limited to a next-hop address (thought a next-hop can be a
>   destination address). this allows for the failover and load balancing
>   referred to above.
> 
> - deprecates the address@interface host syntax in pfctl
> 
>   because routing is done entirely by IPs, the interface is derived from
>   the route lookup, not pf.

I think this requires a notion in changelog.

> 
> this change does not affect some other stuff discussed in the thread:
> 
> - it keeps the current semantic where when route-to changes which
>   interface the packet is travelling over, it runs pf_test again.
> 
>   that's a separate change for broader discussion.
> 

OK sashan



Re: [External] : handle PFRULE_ONCE before pfsync may defer tx of the packet

2021-01-27 Thread Alexandr Nedvedicky
Hello,

On Thu, Jan 28, 2021 at 11:47:30AM +1000, David Gwynne wrote:
> i think these code chunks are around the wrong way.
> 
> pfsync may want to defer the transmission of a packet. it does this so
> it can try and get a state over to a peer firewall before a host may
> send a reply to the peer, which would get dropped cos there's no
> matching state.
> 
> i think the once rule processing should happen before that. the state
> is created from the rule, whether the packet the state is for goes out
> immediately or not shouldn't matter.
> 
> ok?


yes it makes sense. thanks.

OK sashan



Re: [External] : if pf_route{,6} route isn't valid, generate an icmp error

2021-01-26 Thread Alexandr Nedvedicky
Hello,

On Wed, Jan 27, 2021 at 04:41:01PM +1000, David Gwynne wrote:
> at the moment if the route is invalid, we drop the packet. this
> generates an icmp error.
> 
> ok?

looks good to me.

ok sashan

> 
> Index: pf.c
> ===
> RCS file: /cvs/src/sys/net/pf.c,v
> retrieving revision 1.1103
> diff -u -p -r1.1103 pf.c
> --- pf.c  27 Jan 2021 04:46:21 -  1.1103
> +++ pf.c  27 Jan 2021 06:38:12 -
> @@ -6055,6 +6055,10 @@ pf_route(struct pf_pdesc *pd, struct pf_
>  
>   rt = rtalloc(sintosa(dst), RT_RESOLVE, rtableid);
>   if (!rtisvalid(rt)) {
> + if (r->rt != PF_DUPTO) {
> + pf_send_icmp(m0, ICMP_UNREACH, ICMP_UNREACH_HOST,
> + 0, pd->af, s->rule.ptr, pd->rdomain);
> + }
>   ipstat_inc(ips_noroute);
>   goto bad;
>   }
> @@ -6210,6 +6214,11 @@ pf_route6(struct pf_pdesc *pd, struct pf
>   dst->sin6_addr.s6_addr16[1] = htons(ifp->if_index);
>   rt = rtalloc(sin6tosa(dst), RT_RESOLVE, rtableid);
>   if (!rtisvalid(rt)) {
> + if (r->rt != PF_DUPTO) {
> + pf_send_icmp(m0, ICMP6_DST_UNREACH,
> + ICMP6_DST_UNREACH_NOROUTE, 0,
> + pd->af, s->rule.ptr, pd->rdomain);
> + }
>   ip6stat_inc(ip6s_noroute);
>   goto bad;
>   }
> 



Re: [External] : don't run dup-to generated packets through pf_test in pf_route{,6}

2021-01-26 Thread Alexandr Nedvedicky
On Wed, Jan 27, 2021 at 11:31:27AM +1000, David Gwynne wrote:
> this was discussed as part of the big route-to issues thread. i think
> it's easy to break out and handle separately now.
> 
> the diff does what the subject line says. it seems to work as expected
> for me. i don't see weird state issues anymore when i dup my ssh session
> out over a tunnel interface.
> 
> sasha suggested setting PF_TAG_GENERATED on the duplicated packet, but i
> didn't set it in this diff. the reason is that i can't see
> PF_TAG_GENERATED get cleared anywhere. this means that if you dup-to a
> host over a tunnel (eg, gif, gre, etc), the encapsulated packet still
> has that tag, which means pf doesn't run against the encapsulated
> packet.

I'm taking a note to take a look. this feels like a bug to me.

> 
> ok?

OK sashan@
> 
> Index: pf.c
> ===
> RCS file: /cvs/src/sys/net/pf.c,v
> retrieving revision 1.1101
> diff -u -p -r1.1101 pf.c
> --- pf.c  19 Jan 2021 22:22:23 -  1.1101
> +++ pf.c  27 Jan 2021 01:21:24 -
> @@ -6039,7 +6041,7 @@ pf_route(struct pf_pdesc *pd, struct pf_
>   if (ifp == NULL)
>   goto bad;
>  
> - if (pd->kif->pfik_ifp != ifp) {
> + if (r->rt != PF_DUPTO && pd->kif->pfik_ifp != ifp) {
>   if (pf_test(AF_INET, PF_OUT, ifp, ) != PF_PASS)
>   goto bad;
>   else if (m0 == NULL)
> @@ -6194,7 +6195,7 @@ pf_route6(struct pf_pdesc *pd, struct pf
>   if (ifp == NULL)
>   goto bad;
>  
> - if (pd->kif->pfik_ifp != ifp) {
> + if (r->rt != PF_DUPTO && pd->kif->pfik_ifp != ifp) {
>   if (pf_test(AF_INET6, PF_OUT, ifp, ) != PF_PASS)
>   goto bad;
>   else if (m0 == NULL)
> 



Re: [External] : tiny pf_route{,6} tweak

2021-01-26 Thread Alexandr Nedvedicky
On Wed, Jan 27, 2021 at 11:13:12AM +1000, David Gwynne wrote:
> when pf_route (and pf_route6) are supposed to handle forwarding the
> packet (ie, for route-to or reply-to rules), they take the mbuf
> away from the calling code path. this is done by clearing the mbuf
> pointer in the pf_pdesc struct. it doesn't do this for dup-to rules
> though.
> 
> at the moment pf_route clears that pointer on the way out, but it could
> take the mbuf away up front in the same place that it already checks if
> it's a dup-to rule or not.
> 
> it's a small change. i've bumped up the number of lines of context so
> it's easier to read too.
> 
> ok?

OK sashan@
> 
> Index: pf.c
> ===
> RCS file: /cvs/src/sys/net/pf.c,v
> retrieving revision 1.1101
> diff -u -p -r1.1101 pf.c
> --- pf.c  19 Jan 2021 22:22:23 -  1.1101
> +++ pf.c  27 Jan 2021 01:05:29 -
> @@ -5988,6 +5988,7 @@ pf_route(struct pf_pdesc *pd, struct pf_
>   if ((r->rt == PF_REPLYTO) == (r->direction == pd->dir))
>   return;
>   m0 = pd->m;
> + pd->m = NULL;
>   }
>  
>   if (m0->m_len < sizeof(struct ip)) {
> @@ -6108,8 +6109,6 @@ pf_route(struct pf_pdesc *pd, struct pf_
>   ipstat_inc(ips_fragmented);
>  
>  done:
> - if (r->rt != PF_DUPTO)
> - pd->m = NULL;
>   rtfree(rt);
>   return;
>  
> @@ -6146,6 +6145,7 @@ pf_route6(struct pf_pdesc *pd, struct pf
>   if ((r->rt == PF_REPLYTO) == (r->direction == pd->dir))
>   return;
>   m0 = pd->m;
> + pd->m = NULL;
>   }
>  
>   if (m0->m_len < sizeof(struct ip6_hdr)) {
> @@ -6237,8 +6237,6 @@ pf_route6(struct pf_pdesc *pd, struct pf
>   }
>  
>  done:
> - if (r->rt != PF_DUPTO)
> - pd->m = NULL;
>   rtfree(rt);
>   return;
>  
>  
> 



Re: [External] : Re: pf route-to issues

2021-01-26 Thread Alexandr Nedvedicky
Hello,

On Tue, Jan 26, 2021 at 12:33:25PM +0100, Alexander Bluhm wrote:
> On Tue, Jan 26, 2021 at 10:39:30AM +1000, David Gwynne wrote:
> > > But what about dup-to?  The packet is duplicated for both directions.
> > > I guess the main use case for dup-to is implementing a monitor port.
> > > There you have to pass packets stateless, otherwise it would not
> > > work anyway.  The strange semantics is not related to this diff.
> >
> > are you saying i should skip pf_test for all dup-to generated packets?
> 
> I am not sure.
> 
> When we have an in dup-to rule, the incoming packets in request
> direction are dupped and tested with the out ruleset.  The reply
> packets for this state are also dupped, but not tested when they
> leave the dup interface.
> 
> This is inconsistent and cannot work statefully.  Stateful filtering
> with dupped packets does not make sense anyway.  The only working
> config is "pass out on dup-interface no state".
> 
> Do we think this rule should be required?
> 
> 1. No packet should leave an interface without a rule.
> 
> if (pd->dir == PF_IN || s->rt == PF_DUPTO) {
> if (pf_test(AF_INET, PF_OUT, ifp, ) != PF_PASS)
> 
> 2. The config says we want a monitor port.  We risk that the
>original packet and the dupped packet match the same rule.
>Stateful filtering cannot work, we do not expect reply packets
>for the dups.
> 
> if (pd->dir == PF_IN && s->rt != PF_DUPTO) {
> if (pf_test(AF_INET, PF_OUT, ifp, ) != PF_PASS)
> 
> 3. Some sort of problem was there before, but different.  Don't
>address it now.
> 

another option would be to mark duped packet as GENERATED
to short circuit pf_test() here:

6871 
6872 if ((*m0)->m_pkthdr.pf.flags & PF_TAG_GENERATED)
6873 return (PF_PASS);
6874 

perhaps excluding those changes from current diff is good.
this seems to be separate issue anyway.

thanks and
regards
sashan



Re: [External] : Re: pf route-to issues

2021-01-26 Thread Alexandr Nedvedicky
Hello,

On Tue, Jan 26, 2021 at 10:39:30AM +1000, David Gwynne wrote:
> On Mon, Jan 25, 2021 at 04:19:11PM +0100, Alexander Bluhm wrote:
> > On Fri, Jan 22, 2021 at 06:07:59PM +1000, David Gwynne wrote:
> > > --- sys/conf/GENERIC  30 Sep 2020 14:51:17 -  1.273
> > > +++ sys/conf/GENERIC  22 Jan 2021 07:33:30 -
> > > @@ -82,6 +82,7 @@ pseudo-device   msts1   # MSTS line discipl
> > >  pseudo-deviceendrun  1   # EndRun line discipline
> > >  pseudo-devicevnd 4   # vnode disk devices
> > >  pseudo-deviceksyms   1   # kernel symbols device
> > > +pseudo-devicekstat
> > >  #pseudo-device   dt  # Dynamic Tracer
> > >
> > >  # clonable devices
> > 
> > This is an unrelated chunk.
> 
> oh yeah...
> 
> > > +pf_route(struct pf_pdesc *pd, struct pf_state *s)
> > ...
> > > + if (pd->dir == PF_IN) {
> > >   if (pf_test(AF_INET, PF_OUT, ifp, ) != PF_PASS)
> > 
> > Yes, this is the correct logic.  When the packet comes in, pf
> > overrides forwarding, tests the out rules, and sends it.  For
> > outgoing packets on out route-to rules we have already tested the
> > rules.  It also works for reply-to the other way around.
> 
> yep.

I'm suggesting to keep current if () in. and change it with follow
up commit, which will adjust match rule for route-to/reply-to.
would it be OK with you, David?

> 
> > But what about dup-to?  The packet is duplicated for both directions.
> > I guess the main use case for dup-to is implementing a monitor port.
> > There you have to pass packets stateless, otherwise it would not
> > work anyway.  The strange semantics is not related to this diff.
> 
> are you saying i should skip pf_test for all dup-to generated packets?

this makes perfect sense for me.

> 
> > We are reaching a state where this diff can go in.  I just startet
> > a regress run with it.  OK bluhm@
> 
> hopefully i fixed the pfctl error messages up so the regress tests arent
> too unhappy.


thanks and
regards
sashan



Re: [External] : Re: pf route-to issues

2021-01-26 Thread Alexandr Nedvedicky
Hello,


> > >   goto bad;
> > 
> > here we do 'goto bad', which does not call if_put().
> 
> yes it does. the whole chunk with the diff applied is:
> 
> done:
> if (s->rt != PF_DUPTO)
> pd->m = NULL;
> if_put(ifp);
> rtfree(rt);
> return;
> 
> bad:
> m_freem(m0);
> goto done;
> }
> 
> bad drops the mbuf and then goes to done.

yes you are absolutely right, I need glasses.

thanks and
regards
sashan



Re: [External] : Re: pf route-to issues

2021-01-26 Thread Alexandr Nedvedicky
Hello,

> 
> i'll need help with "match on em0 route-to $if_em0_peer". or we can do
> that later separately?

may be can we keep this line in pf_route() untouched at least for now:

6041 
6042 if (pd->kif->pfik_ifp != ifp) {
6043 if (pf_test(AF_INET, PF_OUT, ifp, ) != PF_PASS)
6044 goto bad;
6045 else if (m0 == NULL)
6046 goto done;
6047 if (m0->m_len < sizeof(struct ip)) {
6048 DPFPRINTF(LOG_ERR,
6049 "%s: m0->m_len < sizeof(struct ip)", __func__);
6050 goto bad;
6051 }
6052 ip = mtod(m0, struct ip *);
6053 }
6054 

I think if () at line 6042 does not hurt pfsync(4). This should be removed
with commit, which will introduce the 'match ... route-to'. There should be
more detailed explanation in my response to email from bluhm@.

thanks and
regards
sashan



Re: [External] : Re: pf route-to issues

2021-01-26 Thread Alexandr Nedvedicky
Hello,


> > 
> > 
> > I'm not sure if proposed scenario real. Let's assume there
> > is a PF box with three NICs running on this awkward set up
> > 
> > em1 ... 192.168.1.10
> > 
> > em0
> > 
> > em2 ... 192.168.1.10
> > 
> > em0 is attached to LAN em1 and em2 are facing to internet which is
> > reachable with two different physical lines. both lines are connected 
> > via
> > equipment, which uses fixed IP address 192.168.1.10 and PF admin has
> > no way to change that.
> 
> in this scenario are em1 and em2 connected to the same ethernet
> segment and ip subnet?

em1 and em2 are connected to different wires (broadcast domains).

and I agree this set up is really awkward. not sure if it will work
because it also depends on how ARP table is organized. I think it
works on Solaris, but I have not tried that with OpenBSD.

> 
> > the 'address@interface' syntax is the only way to define rules:
> > 
> > pass in on em0 from 172.16.0.0/16 route-to 192.168.1.10@em1
> > pass in on em0 from 172.17.0.0/16 route-to 192.168.1.10@em2
> > 
> > regardless of how much real such scenario is I believe it can
> > currently work.
> 
> this is a very awkward configuration. while i think what it's trying
> to do is useful, how it is expressed relies on something i want to
> break (fix?).

yes I completely agree. I believe this 'awkward thinking' stems
back to ipf (ipfilter). I found my ancient notes to find some details
on this. My notes refer to ipf.conf(5) manpage [1].

If I understand the purpose of 'address@interface' syntax right,
it allows us to forward packets using explicit pair [ next-hop IP,
interface ]. Thus firewall can emulate a route, which might be missing in
routing table. It just 'blindly' sends matching packet to next-hop using
desired interface assuming that 'next-hop' host will do the right thing
with such packet.

> 
> one of the original reasons i wanted to break this kind of config
> is because pfsync hasn't got a way to exchange interace information,
> and different firewalls are going to have different interface
> topologies anyway. one of the reasons to only use a destination/next
> hop as the argument to route-to rules was so pfsync would work.

yes I understand that. I'm not able to judge how many currently working
configurations will remain broken after we kill 'address@interface' form. 
I'm sure many deployments will get fixed with simple tweak:
echo 'address@interface'|cut -d @ -f 1
but I'm not sure how many people do still need to order desired NIC. 

> 
> i'm pretty sure this is broken at the moment because of bugs in the
> routing code. it is possible to configure routes to 192.168.1.10
> via both em1 and em2 if net.inet.ip.multipath is set to 1, but im
> sure the llinfo (arp and rtable) part of this kind of multipath
> route setup does not work reliably. i guess i should try and get
> my fixes for this into the tree.
> 
> there are two alternate ways i can think of to do this. the first
> is to configure an rtable for each interface:
> 
>   # route -T 1 add default 192.168.1.10 -ifp em1
>   # route -T 2 add default 192.168.1.10 -ifp em2
> 
> then you could write rules like this:
> 
>   pass in on em0 from 172.16.0.0/16 rtable 1
>   pass in on em0 from 172.17.0.0/16 rtable 2
> 
> the other is to add add routes "beyond" 192.168.1.10 and route-to
> them:
> 
>   # route add 127.0.1.10 192.168.1.10 -ifp em1
>   # route add 127.0.2.10 192.168.1.10 -ifp em2
> 
> then you can write rules like this:
> 
>   pass in on em0 from 172.16.0.0/16 route-to 127.0.1.10
>   pass in on em0 from 172.17.0.0/16 route-to 127.0.2.10
> 
> this will likely hit the same bugs in the rtable/arp code i referred
> to above though.
> 
> also, note that i haven't tested either of these.
> 

I think this sounds like a plan how to deal with edge cases,
which can't get fixed with simple 'cut -d @ -f 1'

thanks and
regards
sashan

[1] https://www.freebsd.org/cgi/man.cgi?query=ipf.conf=5=0

(search for reply-to)



Re: [External] : Re: pf route-to issues

2021-01-25 Thread Alexandr Nedvedicky
Hello,

pf_route() might leak a refence to ifp.

> Index: sys/net/pf.c
> ===
> RCS file: /cvs/src/sys/net/pf.c,v
> retrieving revision 1.1101
> diff -u -p -r1.1101 pf.c
> --- sys/net/pf.c  19 Jan 2021 22:22:23 -  1.1101
> +++ sys/net/pf.c  22 Jan 2021 07:33:31 -



> @@ -5998,48 +5994,40 @@ pf_route(struct pf_pdesc *pd, struct pf_
>  
>   ip = mtod(m0, struct ip *);
>  

> +
> + ifp = if_get(rt->rt_ifidx);
>   if (ifp == NULL)
>   goto bad;

here we get a reference to ifp.

>  
> - if (pd->kif->pfik_ifp != ifp) {
> + /* A locally generated packet may have invalid source address. */
> + if ((ntohl(ip->ip_src.s_addr) >> IN_CLASSA_NSHIFT) == IN_LOOPBACKNET &&
> + (ifp->if_flags & IFF_LOOPBACK) == 0)
> + ip->ip_src = ifatoia(rt->rt_ifa)->ia_addr.sin_addr;
> +
> + if (pd->dir == PF_IN) {
>   if (pf_test(AF_INET, PF_OUT, ifp, ) != PF_PASS)
>   goto bad;
>   else if (m0 == NULL)
> @@ -6052,16 +6040,6 @@ pf_route(struct pf_pdesc *pd, struct pf_
>   ip = mtod(m0, struct ip *);
>   }
>  
> - rt = rtalloc(sintosa(dst), RT_RESOLVE, rtableid);
> - if (!rtisvalid(rt)) {
> - ipstat_inc(ips_noroute);
> - goto bad;
> - }
> - /* A locally generated packet may have invalid source address. */
> - if ((ntohl(ip->ip_src.s_addr) >> IN_CLASSA_NSHIFT) == IN_LOOPBACKNET &&
> - (ifp->if_flags & IFF_LOOPBACK) == 0)
> - ip->ip_src = ifatoia(rt->rt_ifa)->ia_addr.sin_addr;
> -
>   in_proto_cksum_out(m0, ifp);
>  
>   if (ntohs(ip->ip_len) <= ifp->if_mtu) {
> @@ -6082,9 +6060,9 @@ pf_route(struct pf_pdesc *pd, struct pf_
>*/
>   if (ip->ip_off & htons(IP_DF)) {
>   ipstat_inc(ips_cantfrag);
> - if (r->rt != PF_DUPTO)
> + if (s->rt != PF_DUPTO)
>   pf_send_icmp(m0, ICMP_UNREACH, ICMP_UNREACH_NEEDFRAG,
> - ifp->if_mtu, pd->af, r, pd->rdomain);
> + ifp->if_mtu, pd->af, s->rule.ptr, pd->rdomain);
>   goto bad;

here we do 'goto bad', which does not call if_put().

>   }
>  
> @@ -6108,8 +6086,9 @@ pf_route(struct pf_pdesc *pd, struct pf_
>   ipstat_inc(ips_fragmented);
>  
>  done:
> - if (r->rt != PF_DUPTO)
> + if (s->rt != PF_DUPTO)
>   pd->m = NULL;
> + if_put(ifp);
>   rtfree(rt);
>   return;
>  


pf_route6() suffers from the same issue.


thanks and
regards
sashan



Re: [External] : Re: pf route-to issues

2021-01-25 Thread Alexandr Nedvedicky
Hello,


On Mon, Jan 25, 2021 at 03:21:29PM +0100, Alexander Bluhm wrote:
> Hi,
> 
> Some personal thoughts.  I am happy when pf route-to gets simpler.
> Especially I have never understood what this address@interface
> syntax is used for.
> 
> I cannot estimate what configuration is used by our cutomers in
> many installations.  Simple syntax change address@interface ->
> address of next hob should be no problem.  Slight semantic changes
> have to be dealt with.  Current packet flow is complicated and may
> be inspired by old NAT behavior.  As long it becomes more sane and
> easier to understand, we should change it.


I'm not sure if proposed scenario real. Let's assume there
is a PF box with three NICs running on this awkward set up

em1 ... 192.168.1.10

em0

em2 ... 192.168.1.10

em0 is attached to LAN em1 and em2 are facing to internet which is
reachable with two different physical lines. both lines are connected via
equipment, which uses fixed IP address 192.168.1.10 and PF admin has
no way to change that.

the 'address@interface' syntax is the only way to define rules:

pass in on em0 from 172.16.0.0/16 route-to 192.168.1.10@em1
pass in on em0 from 172.17.0.0/16 route-to 192.168.1.10@em2

regardless of how much real such scenario is I believe it can
currently work.



> 
> But I don't like artificial restrictions.  We don't know all use
> cases.  reply-to and route-to could be used for both in and out
> rules.  I have used them for strange divert-to on bridge setups.
> It should stay that way.
> 

OK I agree.


regards
sashan



Re: [External] : Re: pf route-to issues

2021-01-25 Thread Alexandr Nedvedicky
hello,


> 
> > +pf_route(struct pf_pdesc *pd, struct pf_state *s)
> ...
> > +   if (pd->dir == PF_IN) {
> > if (pf_test(AF_INET, PF_OUT, ifp, ) != PF_PASS)
> 
> Yes, this is the correct logic.  When the packet comes in, pf
> overrides forwarding, tests the out rules, and sends it.  For
> outgoing packets on out route-to rules we have already tested the
> rules.  It also works for reply-to the other way around.

the change above changes the behavior from 'do pf_test() when packet
crosses interface' to 'call pf_test() at most two times'. the change
essentially breaks the behavior I've mentioned earlier.

table  { 10.10.10.10, 172.16.1.1 }

pass out on em0 from 192.168.1.0/24 to any route-to 
pass out on em1 from 192.168.1.0 to any nat-to (em1)
pass out on em2 all 

the story goes as follows:
destination/next-hop 10.10.10.10 is reached via em1
destination/next-hop 172.16.1.1 is reached via em2

for arbitrary reason we have to do NAT on em1 interface.
with current PF things do work. trading current 'if (...)'
for 'if (pd->dir == PF_IN)' breaks it. The NAT won't happen.

I think this can be solved by using 'match ... route-to ...' rule suggested by
David. However...  I actually start to wonder does it hurt us to keep current
behavior? Why do we want to change it? I'm getting worried this particular
tid-bit will bring more harm than good. It also does not help much pfsync(4),
if I understand things right.


> 
> But what about dup-to?  The packet is duplicated for both directions.
> I guess the main use case for dup-to is implementing a monitor port.
> There you have to pass packets stateless, otherwise it would not
> work anyway.  The strange semantics is not related to this diff.

this is a good point.


thanks and
regards
sahan



Re: [External] : Re: pf route-to issues

2021-01-25 Thread Alexandr Nedvedicky
Hello,


> > 
> > If I understand the idea right, then basically 'match out on em0'
> > figures out the new 'outbound interface' so either 'pass out on em1...' 
> > or
> > 'pass out on em2...' will kick in. In other words:
> > 
> > depending on the destination picked up from  table,
> > the route-to action will override the em0 interface to
> > either em1 or em2.
> 
> yes.
> 
> i dont understand the kif lifetimes though. can we just point a
> pdesc at an arbitrary kif or do we need ot reference count?
> 

as long as we don't release NET_LOCK() (or PF_LOCK() in near future),
the reference to kif should remain valid.

kif is very own PF's abstraction of network interfaces. there are
two ways how one can create a kif:

there is either rule loaded, which refers to interface, which
is not plumbed yet.

the new interface gets plumbed and PF must have a kif for it.

the reference count currently used makes sure we destroy kif when
either interface is gone or when the rule, which refers kif is
gone.

hope I remember details above right.


> > I think this might be way to go.
> > 
> > My only concern now is that such change is subtle. I mean the
> > 
> > pass out ... route-to 
> > 
> > will change behavior in sense that current code will dispatch
> > packet to new interface and run pf_test() again. Once your diff
> > will be in the same rule will be accepted, but will bring entirely
> > different behaviour: just dispatch packet to new interface.
> 
> yeah.
> 
> the counter example is that i was extremely surprised when i
> discovered that pf_test gets run again when the outgoing interface
> is changed with a route-to rule.

surprised because you've forgot about current model? which is:
run pf_test() whenever packet crosses the interface?

> 
> there's subtlety either way, we're just figuring out which one we're
> going for.

yes exactly. there are trade offs either way.



> > I think this is acceptable. If this will cause a friction we can always
> > adjust the code in follow up commit to allow state-less 
> > route-to/reply-to
> > with no support from pfsync(4).
> 
> if we're going to support route-to on match rules i think this will be
> easy to implement. 
> 

I think there must be some broader consent on model change
from current which is run pf_test() for every NIC crossing
to new way, which is run pf_test() at most two times.


> > > 
> > > > > 
> > > > > lastly, the "argument" or address specified with route-to (and
> > > > > reply-to and dup-to) is a destination address, not a next-hop. this
> > > > > has been discussed on the lists a couple of times before, so i won't
> > > > > go over it again, except to reiterate that it allows pf to force
> > > > > "sticky" path selection while opening up the possibility for ecmp
> > > > > and failover for where that path traverses.
> > > > 
> > > > I keep forgetting about it as I still stick to current 
> > > > interpretation.
> > > > 
> > > > 
> > > > I've seen changes to pfctl. Diff below still allows rule:
> > > > 
> > > > pass in on net0 from 192.168.1.0/24 to any route-to 10.10.10.10@em0
> > > 
> > > Is there use case for the @interface syntax apart from the current
> > > route-to rules? If not, we can just delete it.
> > 
> > perhaps I'm still not quite on the same page as you then. I also
> > had no time to entirely test you diff. The way I understand your
> > effort is to change route-to behavior such it will be using
> > a destination instead of next-hop@interface. Or are you planning
> > to keep current form ('route-to next-hop@interface') working?
> 
> if we ignore route-to, what's the use case for the interface part of
> address@interface? it doesnt seem to be accepted as part of an address
> in other parts of the grammar:
> 
> dlg@kbuild ~$ echo pass in from 192.168.0.0@vmx0 | sudo pfctl -nf -
> stdin:1: @if syntax not permitted in from or to
> stdin:1: skipping rule due to errors
> stdin:1: rule expands to no valid combination
> dlg@kbuild ~$ echo pass from 192.168.0.0@vmx0 | sudo pfctl -nf -
> stdin:1: @if syntax not permitted in from or to
> stdin:1: skipping rule due to errors
> stdin:1: rule expands to no valid combination
> dlg@kbuild ~$ echo pass nat-to 192.168.0.0@vmx0 | sudo pfctl -nf -
> stdin:1: @if not permitted
> stdin:1: nat-to and rdr-to require a direction
> stdin:1: skipping rule due to errors
> stdin:1: rule expands to no valid combination
> dlg@kbuild ~$ echo pass nat-to 192.168.0.0@vmx0 | sudo pfctl -nf - 
> 
> if route-to isn't going to use it, can we cut the @interface part out of
> fthe address parser?
> 

that's also my understanding. the form 192.168.0.0@net0 is currently
used by route-to et.al. only. So we can eventually let it go.

however there is a strong objection from Sven, which wants it to keep
it around.


> > unless we 

Re: [External] : Re: pf route-to issues

2021-01-25 Thread Alexandr Nedvedicky
Hello,


> > 
> > I understand that simple is better here, so I won't object
> > if we will lean towards simplified model above. However I still
> > would like to share my view on current PF.
> > 
> > the way I understand how things (should) work currently is fairly 
> > simple:
> > 
> > we always run pf_test() as packet crosses interface.
> > packet can cross interface either in outbound or
> > inbound direction.
> 
> That's how I understand the current code. I'm proposing that we change
> the semantics so they are:
> 
> - we always run pf_test as a packet enters or leaves the network stack.
> - pf is able to filter or apply policy based on various attributes
>   of the packet such as addresses and ports, but also metadata about
>   the packet such as the current prio, or the interface it came
>   from or is going to.
> - changing a packet or it's metadata does not cause a rerun of pf_test.
> - route-to on an incoming packet basically bypasses the default
>   stack processing with a "fast route" out of the stack.
> 
> > this way we can always create a complex route-to loops,
> > however it can also solve some route-to vs. NAT issues.
> > consider those fairly innocent rules:
> > 
> > 8<---8<---8<--8<
> > table  { 10.10.10.10, 172.16.1.1 }
> > 
> > pass out on em0 from 192.168.1.0/24 to any route-to 
> > pass out on em1 from 192.168.1.0 to any nat-to (em1)
> > pass out on em2 all 
> > 8<---8<---8<--8<
> > 
> > Rules above should currently work, but will stop if we will
> > go with simplified model.
> 
> The entries in  make the packet go out em1 and em2?

yes they do. let's say 10.10.10.10 is reached over em1, 172.16.1.1 is
reached over em2. sorry I have not specified that in my earlier email.

> 
> I'm ok with breaking configs like that. We don't run pf_test again for
> other changes to the packet, so if we do want to support something like
> that I think we should make the following work:
> 
>   # pf_pdesc kif is em0
>   match out on em0 from 192.168.1.0/24 to any route-to 
>   # pf_pdesc kif is now em1
>   pass out on em1 from 192.168.1.0 to any nat-to (em1)
>   pass out on em2 all
> 
> This is more in line with how NAT rules operate.

If I understand the idea right, then basically 'match out on em0'
figures out the new 'outbound interface' so either 'pass out on em1...' or
'pass out on em2...' will kick in. In other words:

depending on the destination picked up from  table,
the route-to action will override the em0 interface to
either em1 or em2.

I think this might be way to go.

My only concern now is that such change is subtle. I mean the

pass out ... route-to 

will change behavior in sense that current code will dispatch
packet to new interface and run pf_test() again. Once your diff
will be in the same rule will be accepted, but will bring entirely
different behaviour: just dispatch packet to new interface.


> 
> > I'll be OK with your simplified model if it will make things
> > more explicit:
> > 
> > route-to option should be applied on inbound rules
> > only
> 
> This would restrict how we currently write rules. See below about how we
> would be using it.
> 
> > reply-to option should be applied on outbound rule
> > only
> 
> I'm using reply-to on inbound rules. On these boxes I have a service
> (it's a dns resolver running unbound) that is accessible only via
> gre(4) tunnels, and I need the replies to those connections to go
> out the same interface they came in on. I'm running an older version of
> my diff, so I can have rules like this to make it work:
> 
>   pass in quick on gre0 reply-to gre0:peer
>   pass in quick on gre1 reply-to gre1:peer
> 
> The DNS traffic isn't going through this box, the replies that
> unbound is generating match the state created by the inbound rule.
> 
> If I'm remembering correctly, sthen@ had a similar use case.

you are right, I did not think much of a local bound traffic.
in this case reply-to needs to be kept as-is.

> 
> > dup-to option can go either way (in/out)
> 
> Yep.
> 
> > does it make sense? IMO yes, because doing route-to
> > on outbound path feels unnatural to me.
> 
> I agree that it feels a bit unnatural, but so far all the route-to rules
> I've been writing have been on pass out rules. That could be peculiar to
> my setup, but we generally allow packets in on our external links, and
> apply policy on the outbound interface heading towards the relevant
> service. eg:
> 
>   block
>   pass in on $if_external
>   pass out on $if_webservers proto tcp to port { http https }
>   pass out on $if_relays proto { tcp udp } to port domain
> 
> We'd be sprinkling route-to on these pass out rules to tie connections
> to specific backends.
> 
> > 
> > 
> > 
> > > 
> > > 

Re: [External] : Re: pf route-to issues

2021-01-24 Thread Alexandr Nedvedicky
Hello,

> 
> ok. i don't know how to split up the rest of the change though.
> 
> here's an updated diff that includes the rest of the kernel changes and
> the pfctl and pf.conf tweaks.
> 
> it's probably useful for me to try and explain at a high level what
> i think the semantics should be, otherwise we might end up arguing about
> which bits of the current config i broke.
> 
> so, from an extremely high level point of view, and apologies if
> this is condescending, pf sits between the network stack and an
> interface that a packet travels on. for connections handled by the
> local box, this means packets come from the stack and get an output
> interface selected by a route lookup, then pf checks it, and then
> it goes out the selected interface. replies come into an interface,
> get checked by pf, and then enter the stack. when forwarding, a
> packet comes into an interface, pf checks it, the stack does a route
> lookup to pick an interface, pf checks it again, and then it goes
> out the interface.
> 
> so what does it mean when route-to (or reply-to) gets involved? i'm
> saying that when route-to is applied to a packet, pf takes the packet
> away from the stack and immediately forwards it toward to specified
> destination address. for a packet entering the system, ie, when the
> packet is going from the interface into the stack, route-to should
> pretend that it is forwarding the packet and basically push it
> straight out an interface. however, like normal forwarding via the
> stack, there might be some policy on packets leaving that interface that
> you want to apply, so pf should run pf_test in that situation so the
> policy can be applied. this is especially useful if you need to apply
> nat-to when packets leave a particular interface.
> 
> however, if you route-to when a packet is on the way out of the
> stack, i'm arguing that pf should not run again against that packet.
> currently route-to rules run pf_test again if the interface the packet
> is routed out of changes, which means pf runs multiple times against a
> packet if rules keep changing which interface it goes out. this means
> there's loop prevention in pf to mitigate against this, and weird
> potentials for multiple states to be created when nat gets involved.
> 
> for simplicity, both in terms of reasoning and code i think pf should
> only be run once when a packet enters the system, and only once when it
> leaves the system. the only reason i can come up with for running
> pf_test multiple times when route-to changes the outgoing interface is
> so you can check the packet with "pass out on $new_if" type rules. we
> don't rerun pf again when nat/rdr changes addresses, so this feels
> inconsistent to me.

I understand that simple is better here, so I won't object
if we will lean towards simplified model above. However I still
would like to share my view on current PF.

the way I understand how things (should) work currently is fairly simple:

we always run pf_test() as packet crosses interface.
packet can cross interface either in outbound or
inbound direction.

this way we can always create a complex route-to loops,
however it can also solve some route-to vs. NAT issues.
consider those fairly innocent rules:

8<---8<---8<--8<
table  { 10.10.10.10, 172.16.1.1 }

pass out on em0 from 192.168.1.0/24 to any route-to 
pass out on em1 from 192.168.1.0 to any nat-to (em1)
pass out on em2 all 
8<---8<---8<--8<

Rules above should currently work, but will stop if we will
go with simplified model.

I'll be OK with your simplified model if it will make things
more explicit:

route-to option should be applied on inbound rules
only

reply-to option should be applied on outbound rule
only

dup-to option can go either way (in/out)

does it make sense? IMO yes, because doing route-to
on outbound path feels unnatural to me.



> 
> this also breaks the ability to do route-to without states. is there a
> reason to do that apart from the DSR type things? did we agree that
> those use cases could be handled by sloppy states instead?

If I remember correct we need to make 'keep state' mandatory
for route-to so it can work well with pfsync(4), right?

> 
> lastly, the "argument" or address specified with route-to (and
> reply-to and dup-to) is a destination address, not a next-hop. this
> has been discussed on the lists a couple of times before, so i won't
> go over it again, except to reiterate that it allows pf to force
> "sticky" path selection while opening up the possibility for ecmp
> and failover for where that path traverses.

I keep forgetting about it as I still stick to current interpretation.


I've seen changes to pfctl. Diff below still allows rule:

pass in on net0 from 192.168.1.0/24 to any 

Re: [External] : Re: tell pfctl(8) route-to and reply-to accept next-hop only

2021-01-24 Thread Alexandr Nedvedicky
hello,

On Fri, Jan 22, 2021 at 05:32:47PM +1000, David Gwynne wrote:
> I tried this diff, and it broke the ability to use dynamic addresses.
> ie, the following rules should work:
> 
> pass in on gre52 inet proto icmp route-to (gre49:peer)
> pass in on vmx0 inet proto icmp route-to (gre:peer)

I see, I did not know those should work.
> 
> however, other forms of dynamic interface addresses should fail. or do
> we want to support route-to if0:broadcast too?

I can't think of any valid reason why 'ifp0:broadcast' should work.  this
seems to be poor hack to work around some awkward glitch.  I would prefer
to disable this option now. We can always add it later, when we will
understand the true purpose.


thanks and
regards
sashan



Re: tcpdump pflog af and rewritten addresses

2021-01-19 Thread Alexandr Nedvedicky
Hello,

On Mon, Jan 18, 2021 at 07:47:56PM +0100, Alexander Bluhm wrote:
> Hi,
> 
> tcpdump pflog with addresses rewritten by rdr-to, nat-to, or af-to
> is broken.
> 
> 1. Fix address family of the packet in af-to rules:
> 
> before:
> 19:26:37.620926 169.254.0.14 > 169.254.0.14: icmp: echo request
> 19:26:37.620946 bad-ip6-version 4
> 19:26:37.620963 fc00::23 > fc00::24: icmp6: echo request
> 19:26:37.620977 bad-ip-version 6
> 
> after:
> 19:26:29.606966 169.254.0.14 > 169.254.0.14: icmp: echo request
> 19:26:29.606990 169.254.0.14 > 169.254.0.14: icmp: echo request
> 19:26:29.607006 fc00::23 > fc00::24: icmp6: echo request
> 19:26:29.607019 fc00::24 > fc00::23: icmp6: type-#0
> 
> The type-#0 is still buggy, but it is a step in the right direction.
> 
> 2. Print the addresses that were rewritten if called with -ev:
> 
> This is rdr-to.  Note that "orig src" is the modified address.
> 
> before:
> 19:32:34.843807 rule 2.regress.19/(match) [uid 0, pid 37810] pass out on 
> lo11: [orig src 169.254.0.22:59443, dst 169.254.0.12:9] 169.254.0.12.42793 > 
> 169.254.0.12.9: [bad udp cksum 0900! -> 152f] udp 4 (ttl 64, id 11544, len 
> 32, bad ip cksum c! -> f9a0)
> 
> after:
> 19:32:06.794193 rule 2.regress.19/(match) [uid 0, pid 37810] pass out on 
> lo11: [rewritten: src 169.254.0.22:52093, dst 169.254.0.12:9] 
> 169.254.0.12.1885 > 169.254.0.12.9: [bad udp cksum 27aa! -> e1ce] udp 4 (ttl 
> 64, id 5110, len 32, bad ip cksum c! -> 12c3)
> 
> With af-to the old code confuses the address family:
> 
> before:
> 19:33:45.731267 rule 2.regress.22/(match) [uid 0, pid 37810] pass in on lo11: 
> [orig src 252.0.0.0:64597, dst 252.0.0.0:9] [|ip6]
> 
> after:
> 19:34:05.388153 rule 2.regress.22/(match) [uid 0, pid 37810] pass in on lo11: 
> [rewritten: src fc00::23:65141, dst fc00::24:9] 169.254.0.14.10027 > 
> 169.254.0.14.9: [udp sum ok] udp 4 (ttl 64, id 27481, len 32)
> 
> Basically the kernel uses the information from the packet description
> and fills it into the fields in the pflog header.  While doing this,
> it is trival to figure out whether the packet has been rewritten.
> 
> ok?
> 

diff makes sense to me.


OK sashan@



Re: pflog remove translation

2021-01-19 Thread Alexandr Nedvedicky
Hello,


On Mon, Jan 18, 2021 at 04:47:08PM +0100, Alexander Bluhm wrote:
> Hi,
> 
> pflog(4) tries to log the translated packet with rdr-to, nat-to,
> and af-to applied.  Therefore it creates a mbuf chain on the stack
> with a partial copy.  This might have been a good idea for plain
> IPv4 10 years ago.  But now the concept fails miserably due to:
> 
> - IP options
> - extension header
> - NAT46 af-to
> - fragmented mbuf chains
> 
> Even the plain IPv4 case does not work.  Usually the length checks
> in pf_setup_pdesc() reject the faked mbuf on the stack and we goto
> copy.  Some special cases call pf_translate() and it fails miserably.
> syzkaller has found such a case.
> 
> https://urldefense.com/v3/__https://syzkaller.appspot.com/bug?id=9415f8d0a6f176a629daaa910c431498f5e8aa99__;!!GqivPVa7Brio!L_T-G9Xpq8csLhGKsq5JBtP3aiZTolKPbjXvKMYTPeU-hSLuh2QEfd5HgHl7Pt_MSP3vnGNQ$
>  
> 
> After trying to understand this code for a week, I came to the
> conclusion that it is broken beyond repair.  The funny part is,
> when I remove pflog_mtap(), the pflog output in the cases tested
> by regress/sys/net/pflog stays the same.

I've tried a 'log' and 'log all' variant with rule:

pass out log (all) on vio1 inet from vio2:network to any flags S/SA 
nat-to (vio1) round-robin

and I also could spot _no_ difference in 'tcpdump -i pflog0' output.
similar test with rdr-to also has not revealed any difference.


> 
> Remove this undead code to log the packet as it is.
> 
> ok?
> 

simple is better.

OK sashan@



Re: tell pfctl(8) route-to and reply-to accept next-hop only

2021-01-15 Thread Alexandr Nedvedicky
Hello,

On Fri, Jan 15, 2021 at 06:26:48PM +0100, Alexander Bluhm wrote:
> On Tue, Jan 12, 2021 at 08:45:22PM +0100, Alexandr Nedvedicky wrote:
> > I think bluhm@ and dlg@ have committed part of that change already.
> 
> I have only commited a refactoring change.  Next step in kernel
> would be to remove the check in pf_find_state() and see what happens.
> 
> I was waiting for dlg@ to do it, but maybe he waited for me.
> 
> Index: net/pf.c
> ===
> RCS file: /data/mirror/openbsd/cvs/src/sys/net/pf.c,v
> retrieving revision 1.1098
> diff -u -p -r1.1098 pf.c
> --- net/pf.c  14 Jan 2021 09:44:33 -  1.1098
> +++ net/pf.c  15 Jan 2021 16:46:42 -
> @@ -1122,12 +1122,6 @@ pf_find_state(struct pf_pdesc *pd, struc
>   }
> 
>   *state = s;
> - if (pd->dir == PF_OUT && s->rt_kif != NULL && s->rt_kif != pd->kif &&
> - ((s->rule.ptr->rt == PF_ROUTETO &&
> - s->rule.ptr->direction == PF_OUT) ||
> - (s->rule.ptr->rt == PF_REPLYTO &&
> - s->rule.ptr->direction == PF_IN)))
> - return (PF_PASS);
> 
>   return (PF_MATCH);
>  }
> 

please go ahead and commit the diff to pf_find_state() above.

> > the proposed diff updates pfctl(8) so parser will do 'a right thing',
> 
> Does it work without the kernel changes from dlg@ ?

no it does not. my branch is ahead of tree. I've lost a track.
sorry for being impatient, creating more confusion here.

> 
> > the diff also breaks existing regression tests. We can update
> > them once, we will agree on proposed diff.
> 
> I have adapted my regress pf.conf and regress/sys/net/pf_forward
> fails in the route-to test.  It worked with dlg@'s diff.  So your
> standalone pfctl change does not seem to be sufficient.
> 

my diff is ahead of time. I'll resend, once tree will be ready.

regards
sashan



tell pfctl(8) route-to and reply-to accept next-hop only

2021-01-12 Thread Alexandr Nedvedicky
Hello,

proposed diff follows stuff discussed here [1] (pf route-to issues).  I think
we've reached a consensus to change route-to/reply-to such the only supported
option will be next-hop (and list and table of next-hop addresses).

I think bluhm@ and dlg@ have committed part of that change already.
the proposed diff updates pfctl(8) so parser will do 'a right thing',
namely:

specifying host using form as 1.2.3.4@em0 is not supporte
anymore

diff introduces a next_hop() function, which is a clone of
existing host(). unlike host(), the next_hop() does not accept
a name of local network interface.

the diff also breaks existing regression tests. We can update
them once, we will agree on proposed diff.

thanks and
regards
sashan

[1] https://marc.info/?l=openbsd-tech=160308583701259=2

8<---8<---8<--8<
diff --git a/sbin/pfctl/parse.y b/sbin/pfctl/parse.y
index 2b3e62b1a7e..536aec3286b 100644
--- a/sbin/pfctl/parse.y
+++ b/sbin/pfctl/parse.y
@@ -3745,23 +3745,13 @@ pool_opt: BITMASK   {
;
 
 route_host : STRING{
-   /* try to find @if0 address specs */
-   if (strrchr($1, '@') != NULL) {
-   if (($$ = host($1, pf->opts)) == NULL)  {
-   yyerror("invalid host for route spec");
-   YYERROR;
-   }
+   if (($$ = next_hop($1, pf->opts)) == NULL)  {
+   /* error. "any" is handled elsewhere */
free($1);
-   } else {
-   $$ = calloc(1, sizeof(struct node_host));
-   if ($$ == NULL)
-   err(1, "route_host: calloc");
-   $$->ifname = $1;
-   $$->addr.type = PF_ADDR_NONE;
-   set_ipmask($$, 128);
-   $$->next = NULL;
-   $$->tail = $$;
+   yyerror("could not parse host specification");
+   YYERROR;
}
+   free($1);
}
| STRING '/' STRING {
char*buf;
@@ -3769,7 +3759,7 @@ route_host: STRING{
if (asprintf(, "%s/%s", $1, $3) == -1)
err(1, "host: asprintf");
free($1);
-   if (($$ = host(buf, pf->opts)) == NULL) {
+   if (($$ = next_hop(buf, pf->opts)) == NULL) {
/* error. "any" is handled elsewhere */
free(buf);
yyerror("could not parse host specification");
@@ -3795,33 +3785,6 @@ route_host   : STRING{
$$->next = NULL;
$$->tail = $$;
}
-   | dynaddr '/' NUMBER{
-   struct node_host*n;
-
-   if ($3 < 0 || $3 > 128) {
-   yyerror("bit number too big");
-   YYERROR;
-   }
-   $$ = $1;
-   for (n = $1; n != NULL; n = n->next)
-   set_ipmask(n, $3);
-   }
-   | '(' STRING host ')'   {
-   struct node_host*n;
-
-   $$ = $3;
-   /* XXX check masks, only full mask should be allowed */
-   for (n = $3; n != NULL; n = n->next) {
-   if ($$->ifname) {
-   yyerror("cannot specify interface twice 
"
-   "in route spec");
-   YYERROR;
-   }
-   if (($$->ifname = strdup($2)) == NULL)
-   errx(1, "host: strdup");
-   }
-   free($2);
-   }
;
 
 route_host_list: route_host optweight optnl{ 
diff --git a/sbin/pfctl/pfctl_parser.c b/sbin/pfctl/pfctl_parser.c
index 03317844e91..534b53b1198 100644
--- a/sbin/pfctl/pfctl_parser.c
+++ b/sbin/pfctl/pfctl_parser.c
@@ -1615,17 +1615,16 @@ host(const char *s, int opts)
 {
struct node_host*h = NULL, *n;
int  mask = -1;
-   char*p, *ps, *if_name;
+   char*p, *ps;
const char  *errstr;
 
+   if (strchr(s, '@') != NULL)

Re: pf log user and group

2021-01-11 Thread Alexandr Nedvedicky
Hello,

looks good to me.

OK sashan

On Mon, Jan 11, 2021 at 05:49:10PM +0100, Alexander Bluhm wrote:
> Hi,
> 
> Sometimes an uid is logged in pflog(4) although the logopt of the
> rule does not specify it.  Check the option again for the log rule
> in case another rule has triggered a socket lookup.  Remove logopt
> group, it is not documented and cannot work as struct pfloghdr does
> not contain a gid.  Rename PF_LOG_SOCKET_LOOKUP to PF_LOG_USER to
> express what it does.  The lookup involved is only an implemntation
> detail.
> 
> ok?
> 
> bluhm
> 



Re: pf route-to issues

2021-01-08 Thread Alexandr Nedvedicky
Hello,


> 
> revision 1.294
> date: 2003/01/02 01:56:56;  author: dhartmei;  state: Exp;  lines: +27 -49;
> When route-to/reply-to is used in combination with address translation,
> pf_test() may be called twice for the same packet. In this case, make
> sure the translation is only applied in the second call. This solves
> the problem with state insert failures where the second pf_test() call
> tried to insert another state entry after the first call's translation.
> ok henning@, mcbride@, thanks to Joe Nall for additional testing.
> 
> 
> I have tested your diffs in my setup, they all pass.  I have not
> tested the scenario mentioned in the commit message.  Note that the
> address translation implementation in 2003 was different from what
> we have now.  And sasha@'s analysis shows that the current code is
> wrong in other use cases.
> 

I've completely forgot there was a change in NAT. Therefore I could
not understand the commit message.



> 
> The only way to find out is to commit it.  It reduces comlexity that
> noone understands.
> 
> OK bluhm@ to remove the check
> 
> Please leave the "if (pd->kif->pfik_ifp != ifp)" around pf_test()
> in pf_route() as it is for now.

I agree with bluhm@ here. we should proceed with small steps in such
case and let things to settle down before making next move.


thanks and
regards
sashan



Re: pf route-to issues

2021-01-07 Thread Alexandr Nedvedicky
Hello,

sorry to come back after while...

On Tue, Jan 05, 2021 at 10:05:39PM +1000, David Gwynne wrote:
> On Mon, Jan 04, 2021 at 06:37:54PM +0100, Alexander Bluhm wrote:
> > On Mon, Jan 04, 2021 at 11:21:50PM +1000, David Gwynne wrote:
> > > this chunk pops out as a standalone change.
> > >
> > > having pf_find_state() return PF_PASS here means the callers short
> > > circuit and let the packet go through without running it through the
> > > a lot of the state handling, which includes things like protocol state
> > > updates, nat, scrubbing, some pflog handling, and most importantly,
> > > later calls to pf_route().
> > 
> > pf_route() calls pf_test() again with a different interface.
> > 
> > The idea of this code is, that the interface which is passed to
> > pf_test() from ip_output() is wrong.  The call to pf_set_rt_ifp()
> > changes it in the state.
> > 
> > In the pf_test() call from ip_output() we skip the tests.  We know
> > they will happen in pf_test() called from pf_route().  Without this
> > chunk we would do state handling twice with different interfaces.
> > 
> > Is that analysis correct?
> 
> I think so, but I didn't get as much time to poke at this today as I was
> hoping.
> 
> If the idea is to avoid running most of pf_test again if route-to is
> applied during ip_output, I think this tweaked diff is simpler. Is there
> a valid use case for running some of pf_test again after route-to is
> applied?
> 
> The pf_set_rt_ifp() stuff could be cleaned up if we can get away with
> this.

I think this should go in. I was trying to test this change, but
I'm unable to create set up which would work on google cloud in the
way I need. I suspect failing to convince underlying vswitch
to carry packets between test hosts for me.

as soon as I start to do something more fancy, packets start to disappear
in underlying fabric...

If I understand the change right we really need to take care of
route-to action for inbound packet 'to convince PF' stack
the packet actually enters firewall on interface desired by
route-to action ordered by rule found by currently executed pf_test().

for outbound case the IP stack will be executing the pf_test().

so yes, I'm OK with this change.

regards
sashan



Re: pf route-to issues

2021-01-04 Thread Alexandr Nedvedicky
Hello,


On Mon, Jan 04, 2021 at 06:37:54PM +0100, Alexander Bluhm wrote:
> On Mon, Jan 04, 2021 at 11:21:50PM +1000, David Gwynne wrote:
> > this chunk pops out as a standalone change.
> >
> > having pf_find_state() return PF_PASS here means the callers short
> > circuit and let the packet go through without running it through the
> > a lot of the state handling, which includes things like protocol state
> > updates, nat, scrubbing, some pflog handling, and most importantly,
> > later calls to pf_route().
> 
> pf_route() calls pf_test() again with a different interface.

yes, that's correct. In ideal world one of those should
happen:

a) no state is found, try to find matching rule

b) matching rule found, create a new state if the rule orders to do so

c) no rule found either, just accept packet
> 
> The idea of this code is, that the interface which is passed to
> pf_test() from ip_output() is wrong.  The call to pf_set_rt_ifp()
> changes it in the state.

can you clarify what 'wrong' means here?

> > Index: pf.c
> > ===
> > RCS file: /cvs/src/sys/net/pf.c,v
> > retrieving revision 1.1097
> > diff -u -p -r1.1097 pf.c
> > --- pf.c4 Jan 2021 12:48:27 -   1.1097
> > +++ pf.c4 Jan 2021 13:08:26 -
> > @@ -1122,12 +1122,6 @@ pf_find_state(struct pf_pdesc *pd, struc
> > }
> >
> > *state = s;
> > -   if (pd->dir == PF_OUT && s->rt_kif != NULL && s->rt_kif != pd->kif &&
verify interface matches the one specified in state for outbound packet.

if the interfaces don't match, then

> > -   ((s->rule.ptr->rt == PF_ROUTETO &&
> > -   s->rule.ptr->direction == PF_OUT) ||

we must be either dealing with state created by route-to action bound
to outbound rule.

> > -   (s->rule.ptr->rt == PF_REPLYTO &&
> > -   s->rule.ptr->direction == PF_IN)))
> > -   return (PF_PASS);

or reply-to action bound to inbound rule...


...if it is the case then we short circuit the state check and
assume the state matches...


let's consider the rules as follows:

pass out from 1.2.3.4 to any route-to 10.20.30.40@em1
pass out from any to 1.2.3.4 route-to 192.168.1.10@em0
pass out on em1 from 1.2.3.4 to any nat-to (em1)

let's further assume there is outbound packet on em0 interface.  the packet
matches the rule with 'route-to' action. state gets created:

out@em0 1.2.3.4 -> a.b.c.d route-to 10.20.30.40@em1

pf_route() gets called, and packet arrives to pf_test() again as outbound
on em1 interface. No state should be found, because em1 != em0.
so that complex if does not kick in.

Now let there be a response packet at em1 interface. It gets
translated to a.b.c.d -> 1.2.3.4 and forwarded. let's further
assume the packet hits (default) route, which goes via em3.
There is outbound packet at em3 inspected by pf_test():

out@em3 a.b.c.d -> 1.2.3.4

no matching state is found because of em3 != em0, so packet
hits the 'pass out from any to 1.2.3.4' rule. The rule creates
state:

out@em3 a.b.c.d -> 1.2.3.4 route-to 192.168.1.10@em0

and packet is sent out via pf_route('em0'). The packet enters
pf_test() again. We find matching state this time:

in@em0 1.2.3.4 -> a.b.c.d route-to@10.20.30.40@em1

let's see if the complex condition is met this time:

pd->dir == PF_OUT   (it is outbound packet)
pd->kif == em0  (bound to em0 interface)
s->rt_kif == em1(route* action is bound to em1)
s->rule.ptr->direction == PF_OUT (state got created by outbound rule)
s->rule.ptr == PF_ROUTETO (there is a route-to action at rule)

It looks like the complex if() test is satisfied this time, so
we bail out with PF_PASS. So what are consequences of skipping
proper state check?

first of all the state does not get updated. If the packet, which bypasses
state check is SYN-ACK, the state is left in SYN-SENT state, although the
connection initiator is in connected state already (assuming it receives
SYN-ACK).

I hope I have not missed anything in story above. it seems to me
the code in question can indeed do some harm (sometimes...)


thanks and
regards
sashan



Re: pf route-to issues

2021-01-04 Thread Alexandr Nedvedicky
Hello,



> My diff removes the kif here ...
> 
> > > > -   if (rv == 0) {
> > > > -   s->rt_kif = r->route.kif;
> > > > +   if (rv == 0)
> > > > s->natrule.ptr = r;
> > > > -   }
> 
> ... and the {}.
> 
> Anyway, it should not be commited without the userland part.
> (and not without compling it :-)

yes you are right. there was a conflict and I ignored the resulting
pf.c.rej. entirely my fault.

your diff looks good and should go in. The userland part will be tricky.
perhaps we should fork existing `host()` function and tailor it to route-to et.
al. The current `host()` as-is still tries to resolve name to interface if dns
fails. We don't want it now.

your diff reads OK to me.

thanks and
regards
sashan

> 
> bluhm



Re: pf route-to issues

2021-01-04 Thread Alexandr Nedvedicky
Hello,

I'm sorry I was not clear enough in my earlier email.

On Mon, Jan 04, 2021 at 03:56:45PM +0100, Alexander Bluhm wrote:
> On Mon, Jan 04, 2021 at 03:26:15PM +0100, Alexandr Nedvedicky wrote:
> > you refactoring diff requires a minor finishing touch to keep the
> > stuff compiling:
> 
> Did I commit something that does not compile?  I just made cvs
> update on another machine.  There it worked.

all diffs you commit compile, there are no doubts in this.
I was talking about change sent here:

https://marc.info/?l=openbsd-tech=160976516119388=2

diff above contains chunk here, which removes rt_kif.

8<---8<---8<---8<---8<8<
Index: net/pfvar.h
===
RCS file: /cvs/src/sys/net/pfvar.h,v
retrieving revision 1.497
diff -u -p -r1.497 pfvar.h
--- net/pfvar.h 14 Oct 2020 19:22:14 - 1.497
+++ net/pfvar.h 4 Jan 2021 12:52:02 -
@@ -762,7 +762,6 @@ struct pf_state {
  struct pf_sn_head  src_nodes;
  struct pf_state_key *key[2]; /* addresses stack and wire  */
  struct pfi_kif  *kif;
- struct pfi_kif  *rt_kif;
  u_int64_t   packets[2];
  u_int64_t   bytes[2];
  int32_tcreation;
8<---8<---8<---8<---8<8<

> 
> The rt_kif in pf_state still exists.  The diff below should not
> be necessary.  Maybe you forgot to clean pfvar.h.
> 

so either rt_kif must stay for a while, or your new diff (rebased on top of
stuff committed already) must be expanded by the nit pick I've sent.


to put it clear: I'm concerned the diff posted here:
https://marc.info/?l=openbsd-tech=160976516119388=2
is not complete and should not be committed as is.

thanks and
regards
sashan
> 
> > 8<---8<---8<---8<---8<8<
> > diff --git a/sys/net/pf.c b/sys/net/pf.c
> > index b8766df1686..3f9f5b13add 100644
> > --- a/sys/net/pf.c
> > +++ b/sys/net/pf.c
> > @@ -3428,16 +3428,13 @@ pf_set_rt_ifp(struct pf_state *s, struct pf_addr 
> > *saddr, sa_family_t af,
> > struct pf_rule *r = s->rule.ptr;
> > int rv;
> >  
> > -   s->rt_kif = NULL;
> > if (!r->rt)
> > return (0);
> >  
> > rv = pf_map_addr(af, r, saddr, >rt_addr, NULL, sns, 
> > >route, PF_SN_ROUTE);
> > -   if (rv == 0) {
> > -   s->rt_kif = r->route.kif;
> > +   if (rv == 0)
> > s->natrule.ptr = r;
> > -   }
> >  
> > return (rv);
> >  }
> > 8<---8<---8<---8<---8<8<
> > 
> > 
> > thanks and
> > regards
> > sashan
> 



Re: pf route-to issues

2021-01-04 Thread Alexandr Nedvedicky
Hello,

On Mon, Jan 04, 2021 at 01:57:24PM +0100, Alexander Bluhm wrote:
> On Mon, Jan 04, 2021 at 11:46:16AM +0100, Alexandr Nedvedicky wrote:
> > > let's put this in and then i'll have a look. ok by me.
> > bluhm's diff is fine with me.
> 
> Refactoring is commited, here is the remaining kernel diff after merge.
> 
> bluhm
> 



> Index: net/pf.c
> ===
> RCS file: /cvs/src/sys/net/pf.c,v
> retrieving revision 1.1097
> diff -u -p -r1.1097 pf.c
> --- net/pf.c  4 Jan 2021 12:48:27 -   1.1097
> +++ net/pf.c  4 Jan 2021 12:52:02 -
> @@ -1122,12 +1122,6 @@ pf_find_state(struct pf_pdesc *pd, struc
>   }
> 
>   *state = s;
> - if (pd->dir == PF_OUT && s->rt_kif != NULL && s->rt_kif != pd->kif &&
> - ((s->rule.ptr->rt == PF_ROUTETO &&
> - s->rule.ptr->direction == PF_OUT) ||
> - (s->rule.ptr->rt == PF_REPLYTO &&
> - s->rule.ptr->direction == PF_IN)))
> - return (PF_PASS);
> 
>   return (PF_MATCH);
>  }

I've sent OK to dlg for the chunk above. I think it's worth to have this
fix in separate changeset.


you refactoring diff requires a minor finishing touch to keep the
stuff compiling:

8<---8<---8<---8<---8<8<
diff --git a/sys/net/pf.c b/sys/net/pf.c
index b8766df1686..3f9f5b13add 100644
--- a/sys/net/pf.c
+++ b/sys/net/pf.c
@@ -3428,16 +3428,13 @@ pf_set_rt_ifp(struct pf_state *s, struct pf_addr 
*saddr, sa_family_t af,
struct pf_rule *r = s->rule.ptr;
int rv;
 
-   s->rt_kif = NULL;
if (!r->rt)
return (0);
 
rv = pf_map_addr(af, r, saddr, >rt_addr, NULL, sns, 
>route, PF_SN_ROUTE);
-   if (rv == 0) {
-   s->rt_kif = r->route.kif;
+   if (rv == 0)
s->natrule.ptr = r;
-   }
 
return (rv);
 }
8<---8<---8<---8<---8<8<


thanks and
regards
sashan



Re: pf route-to issues

2021-01-04 Thread Alexandr Nedvedicky
Hello,

On Mon, Jan 04, 2021 at 11:21:50PM +1000, David Gwynne wrote:
> On Mon, Jan 04, 2021 at 01:57:24PM +0100, Alexander Bluhm wrote:
> > On Mon, Jan 04, 2021 at 11:46:16AM +0100, Alexandr Nedvedicky wrote:
> > > > let's put this in and then i'll have a look. ok by me.
> > > bluhm's diff is fine with me.
> > 
> > Refactoring is commited, here is the remaining kernel diff after merge.
> 
> this chunk pops out as a standalone change.
> 
> having pf_find_state() return PF_PASS here means the callers short
> circuit and let the packet go through without running it through the
> a lot of the state handling, which includes things like protocol state
> updates, nat, scrubbing, some pflog handling, and most importantly,
> later calls to pf_route().
> 
> ok?

I think this should go in. I've seen it in bluhm's larger diff,
which I still need to finish.

I'm fine if change will be committed as it solves a real bug.

OK sashan

> 
> Index: pf.c
> ===
> RCS file: /cvs/src/sys/net/pf.c,v
> retrieving revision 1.1097
> diff -u -p -r1.1097 pf.c
> --- pf.c  4 Jan 2021 12:48:27 -   1.1097
> +++ pf.c  4 Jan 2021 13:08:26 -
> @@ -1122,12 +1122,6 @@ pf_find_state(struct pf_pdesc *pd, struc
>   }
>  
>   *state = s;
> - if (pd->dir == PF_OUT && s->rt_kif != NULL && s->rt_kif != pd->kif &&
> - ((s->rule.ptr->rt == PF_ROUTETO &&
> - s->rule.ptr->direction == PF_OUT) ||
> - (s->rule.ptr->rt == PF_REPLYTO &&
> - s->rule.ptr->direction == PF_IN)))
> - return (PF_PASS);
>  
>   return (PF_MATCH);
>  }



Re: pf route-to issues

2021-01-04 Thread Alexandr Nedvedicky
Hello,

there is one more thing, which just came up on my mind.


> 
> so i want to change route-to in pfctl so it takes a nexthop instead
> of an interface. you could argue that pf already lets you do this,
> because there's some bs nexthop@interface syntax. my counter argument
> is that the interface the nexthop is reachable over is redundant, and it
> makes fixing some of the other problems harder if we keep it.
> 

what is your plan for dup-to then? if my understanding of dup-to is
correct, then it allows administrator to copy matching packets and send
them out dedicated interface so another physical box (box with running
snort) can intercept them and process them.

I remember we had to do some assumptions about this, when porting PF to
Solaris. So Solaris interpretation of option

'dup-to net12'

is to send out copy of matching packet via net12 interface.  because there
is no next-hop specified, we just use link broadcast when pushing out the
packet to network. I agree this is a hack. If route-to will be changed
to accept next-hop instead of interface, then we will be able to kill
such hack.



> 
> if we limit the information needed for pf_route to a nexthop address,
> and which direction the address is used, this is doable. both the
> pf_state and pfsync_state structs already contain an address to store a
> nexthop in, i just had to move the route-to direction from the rule into
> the state. this is easy with pf_state, but i used a spare pad field in
> pfsync_state for this.
> 

this should be fine, because route-to et.al. don't work with 'block' rules.


thanks and
regards
sashan



Re: pf route-to issues

2021-01-04 Thread Alexandr Nedvedicky
Hello,


> 
> the stack should provide it on a 4 byte boundary, but it has uint64_t
> members. however, it is also __packed, so the compiler makes no
> assumptions about alignment.
> 
> > Is there anything else that can be split out easily?
> 
> let's put this in and then i'll have a look. ok by me.
> 

bluhm's diff is fine with me.

thanks and
regards
sashan



Re: IPv6 pf_test EACCES

2020-12-22 Thread Alexandr Nedvedicky
On Mon, Dec 21, 2020 at 11:34:04PM +0100, Alexander Bluhm wrote:
> Hi,
> 
> A while ago we decided to pass EACCES to uerland if pf blocks a
> packet.  IPv6 still has the old EHOSTUNREACH code.
> 
> Use the same errno for dropped IPv6 packets as in IPv4.
> 
> ok?
> 

looks good to me.

OK sashan



Re: Enhancing (some) PF state

2020-12-18 Thread Alexandr Nedvedicky
Hello Sven,

your change makes me wonder: 'what is the actual problem you are trying to
solve'?

the reason I'm asking is that latency is just one factor, which contributes to
TCP connection performance. The other factor (and perhaps more important) is to
guess amount of retransmitted data. Processes (a.k.a. endpoints), which
communicate over TCP can experience significant delay once TCP packets starts
to be dropped. Those dropped TCP packets contribute to delay experienced in the
more significant way, than 'network latency' in sense of roundtrip.

I'm not much experienced firewall administrator, the only firewall I run is
APU box at my home, hence I'm sorry if my question sounds naive. So basically
what sort of problem in network you hope to diagnose with PF?

And also don't get me wrong: I like your idea to extend PF to enable firewall
to provide better picture of what happens on network. I just want to point
out that sampling network latency (round-trip) might not be sufficient.

thanks and
regards
sashan



On Thu, Dec 17, 2020 at 02:44:41PM -0500, Sven F. wrote:
> Dear readers,
> 
> pfctl -vv -ss shows detailed information on states.
> I would like to improve the information provided about specific TCP 
> connections,
> regarding the latency of the network.
> An obvious way seems to be to measure the time to get ACKs back.
> Another way would be to use packets timestamps.
> 
> I patched my kernel to extract this information and
> log it, before trying to report it to the userland.
> I did not use timestamps, because I am not sure how i could do that.
> If you have any advice on that, it would be welcome.
> Moreover the patch is a prototype,
> So I would appreciate any feedback on my diff (attached):
> Currently the code is using a LABEL to trigger the measure,
> of course, later it should be a keyword like "latency" in the rules.
> For example :
> match proto tcp to port 80 latency
> Or something else.
> This would be discuss after choosing a method for latency computation,
> 
> Maybe there is a better way to extract network TCP latencies information
> ( i would like to avoid running in promiscuous, but if a current
> packaged software does it well... )
> but I did not come across it.
> Please share if you know of a better way to tackle this.
> 
> 
> Happy holidays !




Re: rw_lock_held() & friends

2020-12-07 Thread Alexandr Nedvedicky
On Mon, Dec 07, 2020 at 10:18:04PM +0100, Alexandr Nedvedicky wrote:
> On Mon, Dec 07, 2020 at 12:22:22PM -0800, Philip Guenther wrote:
> > On Mon, Dec 7, 2020 at 11:16 AM Alexandr Nedvedicky <
> > alexandr.nedvedi...@oracle.com> wrote:
> > 
> > > What's the plan for rw_write_held()? Currently macro is true, if whoever
> > > holds
> > > the lock.
> > >
> > > >
> > > > +#define  rw_write_held(rwl)  (rw_status(rwl) == RW_WRITE)
> > >
> > 
> > Nope.  rw_status() returns RW_WRITE_OTHER if a different thread holds the
> > lock.
> > 
> > 

I'm sorry being noisy it's getting late here. I agree with mpi's diff
and understand how rw_write_held() works in our OpenBSD.


regards
sashan



Re: rw_lock_held() & friends

2020-12-07 Thread Alexandr Nedvedicky
On Mon, Dec 07, 2020 at 12:22:22PM -0800, Philip Guenther wrote:
> On Mon, Dec 7, 2020 at 11:16 AM Alexandr Nedvedicky <
> alexandr.nedvedi...@oracle.com> wrote:
> 
> > What's the plan for rw_write_held()? Currently macro is true, if whoever
> > holds
> > the lock.
> >
> > >
> > > +#define  rw_write_held(rwl)  (rw_status(rwl) == RW_WRITE)
> >
> 
> Nope.  rw_status() returns RW_WRITE_OTHER if a different thread holds the
> lock.
> 
> 

may be I'm still missing something... the netbsd uses this implementation
of rw_write_held():

751 /*
752  * rw_write_held:
753  *
754  *  Returns true if the rwlock is held for writing.  Must only be
755  *  used for diagnostic assertions, and never be used to make
756  *  decisions about how to use a rwlock.
757  */
758 int
759 rw_write_held(krwlock_t *rw)
760 {
761 
762 if (rw == NULL)
763 return 0;
764 return (rw->rw_owner & (RW_WRITE_LOCKED | RW_THREAD)) ==
765 (RW_WRITE_LOCKED | (uintptr_t)curlwp);
766 }


I'm just worried the test proposed by mpi@:

#define rw_write_held(rwl) (rw_status(rwl) == RW_WRITE)

Is not entirely compatible with stuff in NetBSD. perhaps I'm just confused.

thanks for clarification.

regards
sashan



Re: rw_lock_held() & friends

2020-12-07 Thread Alexandr Nedvedicky
Hello,

What's the plan for rw_write_held()? Currently macro is true, if whoever holds
the lock.

>  
> +#define  rw_write_held(rwl)  (rw_status(rwl) == RW_WRITE)

I would expect something like instead:

#define rw_write_held(rwl)  (RWLOCK_OWNER(rwl) == curproc)

the macro above is true if the current process holds the lock.
I just want to make sure if it is really your intention.


thanks and
regards
sashan



Re: PF synproxy should act on inbound packets only

2020-12-03 Thread Alexandr Nedvedicky
Hello,


> 
> Just a style nit.  Other errors do not put stdin:1 in brackes.  One
> line per error.  In pf.conf the rule direction matters.  What about
> 
> stdin:1 warning: synproxy used for inbound rules only, ignored for outbound
> 

thanks, I like your suggestion.

below is updated diff. The new diff also updates pf.conf(5) manpage.

thanks and
regards
sashan

8<---8<---8<--8<
diff --git a/sbin/pfctl/parse.y b/sbin/pfctl/parse.y
index f06171158cb..6c4dde1261f 100644
--- a/sbin/pfctl/parse.y
+++ b/sbin/pfctl/parse.y
@@ -4042,6 +4042,12 @@ rule_consistent(struct pf_rule *r)
"synproxy state or modulate state");
problems++;
}
+
+   if ((r->keep_state == PF_STATE_SYNPROXY) && (r->direction != PF_IN))
+   fprintf(stderr, "%s:%d: warning: "
+   "synproxy used for inbound rules only, "
+   "ignored for outbound\n", file->name, yylval.lineno);
+
if ((r->nat.addr.type != PF_ADDR_NONE ||
r->rdr.addr.type != PF_ADDR_NONE) &&
r->action != PF_MATCH && !r->keep_state) {
diff --git a/share/man/man5/pf.conf.5 b/share/man/man5/pf.conf.5
index e81198370c9..b77ba5d326c 100644
--- a/share/man/man5/pf.conf.5
+++ b/share/man/man5/pf.conf.5
@@ -2126,6 +2126,9 @@ will not work if
 .Xr pf 4
 operates on a
 .Xr bridge 4 .
+Also
+.Cm synproxy state
+option acts on inbound packets only.
 .Pp
 Example:
 .Bd -literal -offset indent
diff --git a/sys/net/pf.c b/sys/net/pf.c
index 823fdc22133..986ee57bff9 100644
--- a/sys/net/pf.c
+++ b/sys/net/pf.c
@@ -4161,7 +4161,7 @@ pf_create_state(struct pf_pdesc *pd, struct pf_rule *r, 
struct pf_rule *a,
s->tag = tag;
}
if (pd->proto == IPPROTO_TCP && (th->th_flags & (TH_SYN|TH_ACK)) ==
-   TH_SYN && r->keep_state == PF_STATE_SYNPROXY) {
+   TH_SYN && r->keep_state == PF_STATE_SYNPROXY && pd->dir == PF_IN) {
int rtid = pd->rdomain;
if (act->rtableid >= 0)
rtid = act->rtableid;



PF synproxy should act on inbound packets only

2020-12-01 Thread Alexandr Nedvedicky
Hello,

the issue described here has been hit bu Stuart some time ago.  feel free to
stop reading if you don't care/use pf(4) synproxy.

let's assume there are rules which allow just surfing web over http:

block all
pass proto tcp from any to any port = 80 synproxy state
pass proto udp from any to any port = 53 

The 'synproxy' rule is not bound to any interface nor any direction. As such
it matches every forwarded SYN packet to port 80 two times. If there would
have been no 'synproxy state' in rule, then pair of states gets created:

A -> B:80 @ IN  # inbound direction at RX NIC

A -> B:80 @ OUT # outbound direction at TX NIC

The 'synproxy' action changes things a bit. The 'synproxy' action is
handled here in pf_create_state():


4163 if (pd->proto == IPPROTO_TCP && (th->th_flags & (TH_SYN|TH_ACK)) ==
4164 TH_SYN && r->keep_state == PF_STATE_SYNPROXY) {
4165 int rtid = pd->rdomain;
4166 if (act->rtableid >= 0)
4167 rtid = act->rtableid;
4168 pf_set_protostate(s, PF_PEER_SRC, PF_TCPS_PROXY_SRC);
4169 s->src.seqhi = arc4random();
4170 /* Find mss option */
4171 mss = pf_get_mss(pd);
4172 mss = pf_calc_mss(pd->src, pd->af, rtid, mss);
4173 mss = pf_calc_mss(pd->dst, pd->af, rtid, mss);
4174 s->src.mss = mss;
4175 pf_send_tcp(r, pd->af, pd->dst, pd->src, th->th_dport,
4176 th->th_sport, s->src.seqhi, ntohl(th->th_seq) + 1,
4177 TH_SYN|TH_ACK, 0, s->src.mss, 0, 1, 0, pd->rdomain);
 ^^ 
4178 REASON_SET(, PFRES_SYNPROXY);
4179 return (PF_SYNPROXY_DROP);
4180 }

Note pf_send_tcp() at line 4175 is being called with argument `tag` set to 1.
`tag` == 1 orders pf_send_tcp() to send PF_TAG_GENERATED flag at packet.

2770 struct mbuf *
2771 pf_build_tcp(const struct pf_rule *r, sa_family_t af,
2772 const struct pf_addr *saddr, const struct pf_addr *daddr,
2773 u_int16_t sport, u_int16_t dport, u_int32_t seq, u_int32_t ack,
2774 u_int8_t flags, u_int16_t win, u_int16_t mss, u_int8_t ttl, int tag,
2775 u_int16_t rtag, u_int sack, u_int rdom)
2776 {

2805 
2806 /* create outgoing mbuf */
2807 m = m_gethdr(M_DONTWAIT, MT_HEADER);
2808 if (m == NULL)
2809 return (NULL);
2810 if (tag)
2811 m->m_pkthdr.pf.flags |= PF_TAG_GENERATED;
2812 m->m_pkthdr.pf.tag = rtag;

the PF_TAG_GENERATED flag brings an unexpected effect to packet sent by
pf_send_tcp() at line 4175. The packet sent by pf_send_tcp() is intercepted
by pf_test() as outbound. However PF_TAG_GENERATED flag turns pf_test()
into No-OP:

6855 int
6856 pf_test(sa_family_t af, int fwdir, struct ifnet *ifp, struct mbuf **m0)
6857 {

#ifdef DIAGNOSTIC
6890 if (((*m0)->m_flags & M_PKTHDR) == 0)
6891 panic("non-M_PKTHDR is passed to pf_test");
6892 #endif /* DIAGNOSTIC */
6893 
6894 if ((*m0)->m_pkthdr.pf.flags & PF_TAG_GENERATED)
6895 return (PF_PASS);
6896 
6897 if ((*m0)->m_pkthdr.pf.flags & PF_TAG_DIVERTED_PACKET)
6898 return (PF_PASS);
6899 

however it is not desired in this case. We actually do want PF to create a
state for such packet, so response B:80 -> A can enter the firewall.

the fix is to apply synproxy action on inbound packets only. Diff below
does that exactly. Furthermore it also makes pfctl(8) to emit warning,
when synproxy is being used in outbound/unbound rule:

lumpy$ echo 'pass proto tcp from any to any port = 80 synproxy state' |./pfctl 
-nf -  
warning (stdin:1): synproxy acts on inbound packets only
synproxy action is ignored for outbound packets


OK?

thanks and
regards
sashan

8<---8<---8<--8<
diff --git a/sbin/pfctl/parse.y b/sbin/pfctl/parse.y
index f06171158cb..d052b5b9a0e 100644
--- a/sbin/pfctl/parse.y
+++ b/sbin/pfctl/parse.y
@@ -4042,6 +4042,13 @@ rule_consistent(struct pf_rule *r)
"synproxy state or modulate state");
problems++;
}
+
+   if ((r->keep_state == PF_STATE_SYNPROXY) && (r->direction != PF_IN))
+   fprintf(stderr, "warning (%s:%d): "
+   "synproxy acts on inbound packets only\n"
+   "synproxy action is ignored for outbound packets\n",
+   file->name, yylval.lineno);
+
if ((r->nat.addr.type != PF_ADDR_NONE ||
r->rdr.addr.type != PF_ADDR_NONE) &&
r->action != PF_MATCH && !r->keep_state) {
diff --git a/sys/net/pf.c b/sys/net/pf.c
index 823fdc22133..986ee57bff9 100644
--- a/sys/net/pf.c
+++ b/sys/net/pf.c
@@ -4161,7 +4161,7 @@ pf_create_state(struct pf_pdesc *pd, struct pf_rule *r, 
struct 

Re: push NET_LOCK() down in pf_ioctl.c

2020-10-19 Thread Alexandr Nedvedicky
Hello Klemens,


> > the change is fairly large, but mostly mechanical.
> Relocating malloc(9) and pool(9) seems good but other pf_*() calls are
> now locked inconsistently after your diff.
> 
> Given that only reason about "allocations" this is either an oversight
> and should be fixed or you need to provide more explanation.
> 
> See inline for the spots I'm talking about.

thanks for catching this. there was unfinished clean up in a branch
I took diff from.

Currently we need to keep pf_rm_rule() under both locks. The function
might be calling pf_tag_unref(), pf_dynaddr_remove()... which alter lists,
which are currently supposed to be protected by PF_LOCK()/NET_LOCK().

updated diff is below. pf_rm_rule() is being called with both locks held.


thanks and
regards
sashan

8<---8<---8<--8<
diff --git a/sys/net/pf_ioctl.c b/sys/net/pf_ioctl.c
index ef7d995e5a7..3a75cd05005 100644
--- a/sys/net/pf_ioctl.c
+++ b/sys/net/pf_ioctl.c
@@ -1006,10 +1006,10 @@ pfioctl(dev_t dev, u_long cmd, caddr_t addr, int flags, 
struct proc *p)
return (EACCES);
}
 
-   NET_LOCK();
switch (cmd) {
 
case DIOCSTART:
+   NET_LOCK();
PF_LOCK();
if (pf_status.running)
error = EEXIST;
@@ -1025,9 +1025,11 @@ pfioctl(dev_t dev, u_long cmd, caddr_t addr, int flags, 
struct proc *p)
DPFPRINTF(LOG_NOTICE, "pf: started");
}
PF_UNLOCK();
+   NET_UNLOCK();
break;
 
case DIOCSTOP:
+   NET_LOCK();
PF_LOCK();
if (!pf_status.running)
error = ENOENT;
@@ -1038,6 +1040,7 @@ pfioctl(dev_t dev, u_long cmd, caddr_t addr, int flags, 
struct proc *p)
DPFPRINTF(LOG_NOTICE, "pf: stopped");
}
PF_UNLOCK();
+   NET_UNLOCK();
break;
 
case DIOCGETQUEUES: {
@@ -1045,6 +1048,7 @@ pfioctl(dev_t dev, u_long cmd, caddr_t addr, int flags, 
struct proc *p)
struct pf_queuespec *qs;
u_int32_tnr = 0;
 
+   NET_LOCK();
PF_LOCK();
pq->ticket = pf_main_ruleset.rules.active.ticket;
 
@@ -1056,6 +1060,7 @@ pfioctl(dev_t dev, u_long cmd, caddr_t addr, int flags, 
struct proc *p)
}
pq->nr = nr;
PF_UNLOCK();
+   NET_UNLOCK();
break;
}
 
@@ -1064,10 +1069,12 @@ pfioctl(dev_t dev, u_long cmd, caddr_t addr, int flags, 
struct proc *p)
struct pf_queuespec *qs;
u_int32_tnr = 0;
 
+   NET_LOCK();
PF_LOCK();
if (pq->ticket != pf_main_ruleset.rules.active.ticket) {
error = EBUSY;
PF_UNLOCK();
+   NET_UNLOCK();
break;
}
 
@@ -1078,10 +1085,12 @@ pfioctl(dev_t dev, u_long cmd, caddr_t addr, int flags, 
struct proc *p)
if (qs == NULL) {
error = EBUSY;
PF_UNLOCK();
+   NET_UNLOCK();
break;
}
memcpy(>queue, qs, sizeof(pq->queue));
PF_UNLOCK();
+   NET_UNLOCK();
break;
}
 
@@ -1091,10 +1100,12 @@ pfioctl(dev_t dev, u_long cmd, caddr_t addr, int flags, 
struct proc *p)
u_int32_tnr;
int  nbytes;
 
+   NET_LOCK();
PF_LOCK();
if (pq->ticket != pf_main_ruleset.rules.active.ticket) {
error = EBUSY;
PF_UNLOCK();
+   NET_UNLOCK();
break;
}
nbytes = pq->nbytes;
@@ -1107,6 +1118,7 @@ pfioctl(dev_t dev, u_long cmd, caddr_t addr, int flags, 
struct proc *p)
if (qs == NULL) {
error = EBUSY;
PF_UNLOCK();
+   NET_UNLOCK();
break;
}
memcpy(>queue, qs, sizeof(pq->queue));
@@ -1121,6 +1133,7 @@ pfioctl(dev_t dev, u_long cmd, caddr_t addr, int flags, 
struct proc *p)
if (error == 0)
pq->nbytes = nbytes;
PF_UNLOCK();
+   NET_UNLOCK();
break;
}
 
@@ -1128,38 +1141,44 @@ pfioctl(dev_t dev, u_long cmd, caddr_t addr, int flags, 
struct proc *p)
struct pfioc_queue  *q = (struct pfioc_queue *)addr;
struct pf_queuespec *qs;
 
-   PF_LOCK();
-   if (q->ticket != 

Re: pf route-to issues

2020-10-19 Thread Alexandr Nedvedicky
Hello,


> > 
> > it seems to me 'route-to vs. pfsync' still needs more thought.  the
> > next-hop IP address in route-to may be different for each PF box
> > linked by pfsync(4). To be honest I have no answer to address this
> > issue at the moment.
> 
> i have thought about that a little bit. we could play with what the
> argument to route-to means. rather than requiring it to be a directly
> connected host/gateway address, we could interpret it as a destination
> address, and use the gateway for that destination as the nexthop.
> 
> eg, if i have the following routing table on frontend a:
> 
> Internet:
> DestinationGatewayFlags   Refs  Use   Mtu  Prio Iface
> default192.168.96.33  UGS6   176171 - 8 vmx0 
> 224/4  127.0.0.1  URS00 32768 8 lo0  
> 10.0.0.0/30192.168.0.1UGS00 - 8 gre0 

> 
> and this routing table on frontend b:
> 
> Internet:
> DestinationGatewayFlags   Refs  Use   Mtu  Prio Iface
> default192.168.96.33  UGS987548 - 8 aggr0
> 224/4  127.0.0.1  URS00 32768 8 lo0  
> 10.0.0.0/30192.168.0.3UGS00 - 8 gre0 

> 
> if gre0 on both frontends pointed at different legs on the same backend
> server, i could write a pf rule like this:
> 
> pass out to port 80 route-to 10.0.0.1
> 
> 10.0.0.1 would then end up as the rt_addr field in pf_state and
> pfsync_state.
> 
> both frontend a and b would lookup the route to 10.0.0.1, and then
> use 192.168.0.1 and 192.168.0.3 as the gateway address respectively.
> both would end up pushing the packet over their gre link to the
> same backend. the same semantic would work if the link to the backend
> was over ethernet instead of a tunnel.
> 
> > > thoughts?
> > > 
> > 
> > What you've said makes sense. However I still feel pfsync(4)
> > does not play well with route-to.
> 
> maybe your opinion is different if the above makes sense?
> 

Thanks for detailed explanation. This is good enough to make me happy.
The remaining questions on this are sort of 'homework for me' to poke
to PF source code, for example:
are we doing route look up for every packet? or route look up
is performed when state is created/imported? (and we cache
outbound interface + next-hop along the state)

also what happens when route does not exist on pfsync peer, which
receives state? How admin will discover state failed to import?

Anyway, your plan above looks solid to me now. It's certainly more flexible
(?reliable?) to select route to particular destination, than using pair of
interface,next-hop.

thanks and
regards
sashan



Re: pf route-to issues

2020-10-19 Thread Alexandr Nedvedicky
Hello,

disclaimer: I have no chance to run pfsync on production, I'm very
inexperienced with pfsync(4).


> 
> the third problem is that pf_route relies on information from rules to
> work correctly. this is a problem in a pfsync environment because you
> cannot have the same ruleset on both firewalls 100% of the time, which
> means you cannot have route-to/reply-to behave consistently on a pair of
> firwalls 100% of the time.
> 
> my solution to both these problems is reduce the amount of information
> pf_route needs to work with, to make sure that the info it does need
> is in the pf state structure, and that pfsync handles it properly.
> 
> if we limit the information needed for pf_route to a nexthop address,
> and which direction the address is used, this is doable. both the
> pf_state and pfsync_state structs already contain an address to store a
> nexthop in, i just had to move the route-to direction from the rule into
> the state. this is easy with pf_state, but i used a spare pad field in
> pfsync_state for this.
> 

it seems to me 'route-to vs. pfsync' still needs more thought.  the
next-hop IP address in route-to may be different for each PF box
linked by pfsync(4). To be honest I have no answer to address this
issue at the moment.

> 
> thoughts?
> 

What you've said makes sense. However I still feel pfsync(4)
does not play well with route-to.


thanks and
regards
sashan



push NET_LOCK() down in pf_ioctl.c

2020-10-16 Thread Alexandr Nedvedicky
Hello,

I've just found a forgotten diff in my tree. The diff pushes the NET_LCOK()
further down in PF driver ioctl() path.  The idea is to avoid sleeping while
holding a NET_LOCK().  this typically may happen when we need to allocate
memory. The diff is the first step as it takes care of easy/straightforward
cases of such allocations. The allocations, which still may happen under
the NET_LOCK() require more work in areas:
PF tables,
packet queues,
transactions,

the change is fairly large, but mostly mechanical.

OK?

thanks and
regards
sashan

8<---8<---8<--8<
diff --git a/sys/net/pf_ioctl.c b/sys/net/pf_ioctl.c
index ef7d995e5a7..bac644fa6d1 100644
--- a/sys/net/pf_ioctl.c
+++ b/sys/net/pf_ioctl.c
@@ -1006,10 +1006,10 @@ pfioctl(dev_t dev, u_long cmd, caddr_t addr, int flags, 
struct proc *p)
return (EACCES);
}
 
-   NET_LOCK();
switch (cmd) {
 
case DIOCSTART:
+   NET_LOCK();
PF_LOCK();
if (pf_status.running)
error = EEXIST;
@@ -1025,9 +1025,11 @@ pfioctl(dev_t dev, u_long cmd, caddr_t addr, int flags, 
struct proc *p)
DPFPRINTF(LOG_NOTICE, "pf: started");
}
PF_UNLOCK();
+   NET_UNLOCK();
break;
 
case DIOCSTOP:
+   NET_LOCK();
PF_LOCK();
if (!pf_status.running)
error = ENOENT;
@@ -1038,6 +1040,7 @@ pfioctl(dev_t dev, u_long cmd, caddr_t addr, int flags, 
struct proc *p)
DPFPRINTF(LOG_NOTICE, "pf: stopped");
}
PF_UNLOCK();
+   NET_UNLOCK();
break;
 
case DIOCGETQUEUES: {
@@ -1045,6 +1048,7 @@ pfioctl(dev_t dev, u_long cmd, caddr_t addr, int flags, 
struct proc *p)
struct pf_queuespec *qs;
u_int32_tnr = 0;
 
+   NET_LOCK();
PF_LOCK();
pq->ticket = pf_main_ruleset.rules.active.ticket;
 
@@ -1056,6 +1060,7 @@ pfioctl(dev_t dev, u_long cmd, caddr_t addr, int flags, 
struct proc *p)
}
pq->nr = nr;
PF_UNLOCK();
+   NET_UNLOCK();
break;
}
 
@@ -1064,10 +1069,12 @@ pfioctl(dev_t dev, u_long cmd, caddr_t addr, int flags, 
struct proc *p)
struct pf_queuespec *qs;
u_int32_tnr = 0;
 
+   NET_LOCK();
PF_LOCK();
if (pq->ticket != pf_main_ruleset.rules.active.ticket) {
error = EBUSY;
PF_UNLOCK();
+   NET_UNLOCK();
break;
}
 
@@ -1078,10 +1085,12 @@ pfioctl(dev_t dev, u_long cmd, caddr_t addr, int flags, 
struct proc *p)
if (qs == NULL) {
error = EBUSY;
PF_UNLOCK();
+   NET_UNLOCK();
break;
}
memcpy(>queue, qs, sizeof(pq->queue));
PF_UNLOCK();
+   NET_UNLOCK();
break;
}
 
@@ -1091,10 +1100,12 @@ pfioctl(dev_t dev, u_long cmd, caddr_t addr, int flags, 
struct proc *p)
u_int32_tnr;
int  nbytes;
 
+   NET_LOCK();
PF_LOCK();
if (pq->ticket != pf_main_ruleset.rules.active.ticket) {
error = EBUSY;
PF_UNLOCK();
+   NET_UNLOCK();
break;
}
nbytes = pq->nbytes;
@@ -1107,6 +1118,7 @@ pfioctl(dev_t dev, u_long cmd, caddr_t addr, int flags, 
struct proc *p)
if (qs == NULL) {
error = EBUSY;
PF_UNLOCK();
+   NET_UNLOCK();
break;
}
memcpy(>queue, qs, sizeof(pq->queue));
@@ -1121,6 +1133,7 @@ pfioctl(dev_t dev, u_long cmd, caddr_t addr, int flags, 
struct proc *p)
if (error == 0)
pq->nbytes = nbytes;
PF_UNLOCK();
+   NET_UNLOCK();
break;
}
 
@@ -1128,38 +1141,44 @@ pfioctl(dev_t dev, u_long cmd, caddr_t addr, int flags, 
struct proc *p)
struct pfioc_queue  *q = (struct pfioc_queue *)addr;
struct pf_queuespec *qs;
 
-   PF_LOCK();
-   if (q->ticket != pf_main_ruleset.rules.inactive.ticket) {
-   error = EBUSY;
-   PF_UNLOCK();
-   break;
-   }
qs = pool_get(_queue_pl, PR_WAITOK|PR_LIMITFAIL|PR_ZERO);
if (qs 

Re: net/pfvar.h: MAXPATHLEN -> PATH_MAX

2020-10-13 Thread Alexandr Nedvedicky
Hello,

On Tue, Oct 13, 2020 at 10:31:28PM +0200, Christian Weisgerber wrote:
> In revision 1.407 of , deraadt@ replaced MAXPATHLEN
> with PATH_MAX so userland wouldn't have to pull in .
> 
> In 1.466, sashan@ accidentally slipped one MAXPATHLEN back in,
> because its use is ubiquitous on the kernel side of pf.  Switch
> this over to PATH_MAX again.
> 
> pfctl(8) doesn't actually use this, so it's very cosmetic.

fine with me. and sorry for inconveniences.

OK sashan

> 
> Index: sys/net/pfvar.h
> ===
> RCS file: /cvs/src/sys/net/pfvar.h,v
> retrieving revision 1.496
> diff -u -p -r1.496 pfvar.h
> --- sys/net/pfvar.h   24 Aug 2020 15:30:58 -  1.496
> +++ sys/net/pfvar.h   13 Oct 2020 20:21:04 -
> @@ -475,7 +475,7 @@ union pf_rule_ptr {
>  };
>  
>  #define  PF_ANCHOR_NAME_SIZE  64
> -#define  PF_ANCHOR_MAXPATH   (MAXPATHLEN - PF_ANCHOR_NAME_SIZE - 1)
> +#define  PF_ANCHOR_MAXPATH   (PATH_MAX - PF_ANCHOR_NAME_SIZE - 1)
>  #define  PF_OPTIMIZER_TABLE_PFX  "__automatic_"
>  
>  struct pf_rule {
> -- 
> Christian "naddy" Weisgerber  na...@mips.inka.de
> 



Re: Non-const basename: sbin/pfctl

2020-10-13 Thread Alexandr Nedvedicky
Hello,

On Tue, Oct 13, 2020 at 10:52:58PM +0200, Christian Weisgerber wrote:
> Accommodate a basename(3) that takes a non-const parameter and may
> in fact modify the string buffer.
> 
> The length of anchor has already been checked in main().
> 
> ok?

looks OK to me.

sashan

> 
> Index: sbin/pfctl/pfctl.c
> ===
> RCS file: /cvs/src/sbin/pfctl/pfctl.c,v
> retrieving revision 1.382
> diff -u -p -r1.382 pfctl.c
> --- sbin/pfctl/pfctl.c16 Jan 2020 01:02:20 -  1.382
> +++ sbin/pfctl/pfctl.c13 Oct 2020 20:45:16 -
> @@ -2241,16 +2241,19 @@ pfctl_get_anchors(int dev, const char *a
>  {
>   struct pfioc_rulesetpr;
>   static struct pfr_anchors anchors;
> + char anchorbuf[PATH_MAX];
>   char *n;
>  
>   SLIST_INIT();
>  
>   memset(, 0, sizeof(pr));
>   if (*anchor != '\0') {
> - n = dirname(anchor);
> + strlcpy(anchorbuf, anchor, sizeof(anchorbuf));
> + n = dirname(anchorbuf);
>   if (n[0] != '.' && n[1] != '\0')
>   strlcpy(pr.path, n, sizeof(pr.path));
> - n = basename(anchor);
> + strlcpy(anchorbuf, anchor, sizeof(anchorbuf));
> + n = basename(anchorbuf);
>   if (n != NULL)
>   strlcpy(pr.name, n, sizeof(pr.name));
>   }
> -- 
> Christian "naddy" Weisgerber  na...@mips.inka.de
> 



Re: diff: pfctl: error message for nonexisting rtable

2020-10-01 Thread Alexandr Nedvedicky
Hello,

On Wed, Sep 30, 2020 at 11:02:28PM +0200, Klemens Nanni wrote:
> On Sun, Sep 20, 2020 at 07:29:38PM +0200, Klemens Nanni wrote:
> > Rebased diff after yasouka's pfctl commit;  it still takes care of
> > rdomains only, but I'd appreciate folks using `on rdomain' in their
> > pf.conf test this.  If this works out I'd like to put it in shortly
> > after release and work on rtables next.
> Working for me so far, anyone else playing around with it?
> 
> I also checked CVS log and the existing rtable_l2() check goes back to
> claudio's introduction of `on rdomain N' in pf, i.e. it's been there
> from the start as a harmless but needless check and has not been added
> after the fact to fix anything.
> 
> The time is now, so here's an updated diff after sashan pointed out how
> I erroneously checked the rtable ID instead of the rdomain ID in the
> ioctl (copy/pasta mistake).
> 
> Feedback? OK?

fix looks OK to me.

thanks and
regards
sashan



Re: pf: remove ptr_array from struct pf_ruleset

2020-08-24 Thread Alexandr Nedvedicky
Hello,


> 
> Admins using `once' rules are hopefully aware of this caveat already,
> but now the checksum actually indicates out-of-sync rulesets and does
> no longer present the same checksum for different rulesets.
> 
> Feedback? OK?
> 
> 

OK sashan@



Re: pf: route-to least-states

2020-07-28 Thread Alexandr Nedvedicky
Hello Yasuoka,


On Wed, Jul 29, 2020 at 01:55:20AM +0900, YASUOKA Masahiko wrote:
> Hi,
> 
> Let me add another fix of previous.
> 
> ok?

OK. thanks for taking care of that. I've entirely missed 1 vs. -1 return
value, when reviewing your change.

regards
sashan



Re: pf: route-to least-states

2020-07-28 Thread Alexandr Nedvedicky
Hello Yasuoka,

On Wed, Jul 29, 2020 at 01:22:48AM +0900, YASUOKA Masahiko wrote:
> Hi,
> 
> Previous commit has a wrong part..
> 
> ok?
> 
> Fix previous commit which referred wrong address.

would it make sense to move the block, you've introduced earler
under the !PF_AZERO() branch just couple lines below. something
like this:

8<---8<---8<--8<
diff --git a/sys/net/pf_lb.c b/sys/net/pf_lb.c
index 510795a4d0b..f77d96a99ec 100644
--- a/sys/net/pf_lb.c
+++ b/sys/net/pf_lb.c
@@ -322,13 +322,13 @@ pf_map_addr_sticky(sa_family_t af, struct pf_rule *r, 
struct pf_addr *saddr,
return (-1);
}
 
-   if ((rpool->opts & PF_POOL_TYPEMASK) == PF_POOL_LEASTSTATES) {
-   if (pf_map_addr_states_increase(af, rpool, naddr) == -1)
+   if (!PF_AZERO(cached, af)) {
+   pf_addrcpy(naddr, cached, af);
+   if ((rpool->opts & PF_POOL_TYPEMASK) == PF_POOL_LEASTSTATES) &&
+   ((pf_map_addr_states_increase(af, rpool, cached) == -1))
return (-1);
}
 
-   if (!PF_AZERO(cached, af))
-   pf_addrcpy(naddr, cached, af);
if (pf_status.debug >= LOG_DEBUG) {
log(LOG_DEBUG, "pf: pf_map_addr: "
"src tracking (%u) maps ", type);

8<---8<---8<--8<

It seems to me it would be better to bump number of states if and only if we
actually find some address in pool.

thanks and
regards
sashan



Re: pf: route-to least-states

2020-07-24 Thread Alexandr Nedvedicky
Hello Yasuoka,

> 
> Yes.
> 
> Let me simplify the block for "least-states".
> 


thanks for your explanation. It helped me to understand the code.

I'm OK with your fix.

thanks and
regards
sashan



Re: pf: route-to least-states

2020-07-23 Thread Alexandr Nedvedicky
Hello Yasuoka,



> - interface is not selected properly if selected table entry specifies
>   an interface.

to be honest I don't quite understand what's going on here.
can you share some details of configuration/scenario, which
triggers the bug your diff is fixing?


the part of your change, which I'm not able to figure out is
this single line:

> + if (pf_map_addr_states_increase(af, rpool, naddr) == -1)
> + return (1);
> + /* revert the kif which was set by pfr_pool_get() */
> + rpool->kif = kif;
>   break;
>   }

your fix changes behavior, which is there since least-state
option has been introduced. I believe it does not matter
for case when route-to specifies single interface such as:

route-to 192.168.1.10@em0 least-states

I'm not sure what will happen in situation, when there are more interfaces
specified in combination with sticky-address:

route-to {192.168.1.10@em0, 192.168.1.20@em1} last-states sticky-address

the resulting code does not look quite right with your diff applied:

602 } while (pf_match_addr(1, , rmask, >counter, 
af) &&
603 (states > 0));
604 
605 if (pf_map_addr_states_increase(af, rpool, naddr) == -1)
606 return (1);
607 /* revert the kif which was set by pfr_pool_get() */
608 rpool->kif = kif;
609 break;
610 }
611 
612 if (rpool->opts & PF_POOL_STICKYADDR) {
613 if (sns[type] != NULL) {
614 pf_remove_src_node(sns[type]);
615 sns[type] = NULL;
616 }
617 if (pf_insert_src_node([type], r, type, af, saddr, 
naddr,
618 rpool->kif))
619 return (1);
620 }


at line 608 new code reverts kif set by pfr_pool_get(). At line 617
(executed when sticky-address is set) the original code passes kif chosen be
pfr_pool_get(). You diff changes that behavior. So my question is simple:
is that intentional change?

thanks and
regards
sashan



Re: pf: route-to {random,srchash} in an anchor

2020-07-23 Thread Alexandr Nedvedicky
Hello Yasuok,

On Thu, Jul 23, 2020 at 08:01:18PM +0900, YASUOKA Masahiko wrote:
> Hi,
> 
> Last month, I fixed the problem "route-to least-state" in an anchor
> didn't work.
> 
> https://marc.info/?t=15911745782=1=2
> 
> I noticed the same problem happens on "random" and "srchash" as well.
> 
> ok?

change looks good. I have just one nit-pick comment. I leave decision
whether it's worth to adjust your diff or commit as-is up to you.

see in-line further below.

OK sashan@

> 
> Use the table on root always if current table is not active.
> 
> Index: sys/net/pf_lb.c
> ===
> RCS file: /disk/cvs/openbsd/src/sys/net/pf_lb.c,v
> retrieving revision 1.64
> diff -u -p -r1.64 pf_lb.c
> --- sys/net/pf_lb.c   2 Jul 2019 09:04:53 -   1.64
> +++ sys/net/pf_lb.c   23 Jul 2020 10:45:48 -
> @@ -345,6 +345,7 @@ pf_map_addr(sa_family_t af, struct pf_ru
>   struct pf_addr   faddr;
>   struct pf_addr  *raddr = >addr.v.a.addr;
>   struct pf_addr  *rmask = >addr.v.a.mask;
> + struct pfr_ktable   *kt;
>   u_int64_tstates;
>   u_int16_tweight;
>   u_int64_tload;
> @@ -396,18 +397,17 @@ pf_map_addr(sa_family_t af, struct pf_ru
>   pf_poolmask(naddr, raddr, rmask, saddr, af);
>   break;
>   case PF_POOL_RANDOM:
> - if (rpool->addr.type == PF_ADDR_TABLE) {
> - cnt = rpool->addr.p.tbl->pfrkt_cnt;
> - if (cnt == 0)
> - rpool->tblidx = 0;
> + if (rpool->addr.type == PF_ADDR_TABLE ||
> + rpool->addr.type == PF_ADDR_DYNIFTL) {
> + if (rpool->addr.type == PF_ADDR_TABLE)
> + kt = rpool->addr.p.tbl;
>   else
> - rpool->tblidx = (int)arc4random_uniform(cnt);
> - memset(>counter, 0, sizeof(rpool->counter));
> - if (pfr_pool_get(rpool, , , af))
> + kt = rpool->addr.p.dyn->pfid_kt;
> + kt = pfr_ktable_select_active(kt);
> + if (!kt)

I would prefer to use 'kt == NULL', so it becomes
clear we deal with pointer.


> @@ -453,18 +453,18 @@ pf_map_addr(sa_family_t af, struct pf_ru
>   case PF_POOL_SRCHASH:
>   hashidx =
>   pf_hash(saddr, (struct pf_addr *), >key, af);
> - if (rpool->addr.type == PF_ADDR_TABLE) {
> - cnt = rpool->addr.p.tbl->pfrkt_cnt;
> - if (cnt == 0)
> - rpool->tblidx = 0;
> +
> + if (rpool->addr.type == PF_ADDR_TABLE ||
> + rpool->addr.type == PF_ADDR_DYNIFTL) {
> + if (rpool->addr.type == PF_ADDR_TABLE)
> + kt = rpool->addr.p.tbl;
>   else
> - rpool->tblidx = (int)(hashidx % cnt);
> - memset(>counter, 0, sizeof(rpool->counter));
> - if (pfr_pool_get(rpool, , , af))
> + kt = rpool->addr.p.dyn->pfid_kt;
> + kt = pfr_ktable_select_active(kt);
> + if (!kt)

same here.

> ===
> RCS file: /disk/cvs/openbsd/src/sys/net/pf_table.c,v
> retrieving revision 1.133
> diff -u -p -r1.133 pf_table.c
> --- sys/net/pf_table.c24 Jun 2020 22:03:43 -  1.133
> +++ sys/net/pf_table.c23 Jul 2020 10:45:48 -
> @@ -2108,9 +2108,8 @@ pfr_kentry_byaddr(struct pfr_ktable *kt,
>   struct sockaddr_in6  tmp6;
>  #endif /* INET6 */
>  
> - if (!(kt->pfrkt_flags & PFR_TFLAG_ACTIVE) && kt->pfrkt_root != NULL)
> - kt = kt->pfrkt_root;
> - if (!(kt->pfrkt_flags & PFR_TFLAG_ACTIVE))
> + kt = pfr_ktable_select_active(kt);
> + if (!kt)
^^^
and here too.


> @@ -2308,9 +2306,8 @@ pfr_pool_get(struct pf_pool *rpool, stru
>   kt = rpool->addr.p.dyn->pfid_kt;
>   else
>   return (-1);
> - if (!(kt->pfrkt_flags & PFR_TFLAG_ACTIVE) && kt->pfrkt_root != NULL)
> - kt = kt->pfrkt_root;
> - if (!(kt->pfrkt_flags & PFR_TFLAG_ACTIVE))
> + kt = pfr_ktable_select_active(kt);
> + if (!kt)

here as well


I like the introduction of pfr_ktable_select_active(), it reads far better
than current code in tree.

thanks and
regards
sashan



Re: pfctl.8: mention hostid and checksum for -s info

2020-07-20 Thread Alexandr Nedvedicky
Hello,


On Mon, Jul 20, 2020 at 03:23:41PM +0200, Klemens Nanni wrote:
> Getting the checksum with pfctl(8) is either in your finger's muscle
> memory or takes guess work as the manual doesn't mention it.
> 
> I grepped the code to see that I need `-s info' with `-v'.
> 
> (Setting) hostid is described in pf.conf(5) but pfctl(8) doesn't tell us
> how to print it, there's merely an example for killing states based on
> it.
> 
> Complete the description of `-s info' to mention and describe both such
> that grepping for it in the manual pager yields any results.
> 
> Feedback? OK?
> 

fine with me.

OK sashan



Re: pf: remove ptr_array from struct pf_ruleset

2020-07-20 Thread Alexandr Nedvedicky
Hello Klemens.

I took a closer look at your change and related area.  Below is an alternate
way to fix the bug you've found.

there are few details worth to note:

ptr_array matters to main ruleset only, because it is the only 
ruleset covered by MD5 sum for pfsync.

it looks like we should also recompute the MD5sum if we remove the rule
from the main ruleset 

my diff introduces a new member of pr_ruleset structure, which keeps the
array size so we can pass it to free(9f) later.

I have no pfsync deployment readily available for testing. Will be glad
if you can give my diff a try.

thanks a lot
regards
sashan

8<--8<--8<-8<---
diff --git a/sys/net/pf_ioctl.c b/sys/net/pf_ioctl.c
index a486745bc1c..00a4f70e7db 100644
--- a/sys/net/pf_ioctl.c
+++ b/sys/net/pf_ioctl.c
@@ -332,8 +332,32 @@ pf_purge_rule(struct pf_rule *rule)
 
pf_rm_rule(ruleset->rules.active.ptr, rule);
ruleset->rules.active.rcount--;
-   TAILQ_FOREACH(rule, ruleset->rules.active.ptr, entries)
-   rule->nr = nr++;
+
+   /*
+* pf_chksum calculation is copied and modified from
+* pf_setup_pfsync_matching(), which is part of ruleset commit
+* operation.
+*/
+   if (ruleset == _main_ruleset) {
+   MD5_CTX ctx;
+   u_int8_tdigest[PF_MD5_DIGEST_LENGTH];
+
+   MD5Init();
+
+   TAILQ_FOREACH(rule, ruleset->rules.active.ptr, entries) {
+   rule->nr = nr++;
+   pf_hash_rule(, rule);
+   ruleset->rules.active.ptr_array[rule->nr] = rule;
+   }
+
+   MD5Final(digest, );
+   memcpy(pf_status.pf_chksum, digest, 
sizeof(pf_status.pf_chksum));
+   } else {
+   TAILQ_FOREACH(rule, ruleset->rules.active.ptr, entries) {
+   rule->nr = nr++;
+   }
+   }
+
ruleset->rules.active.ticket++;
pf_calc_skip_steps(ruleset->rules.active.ptr);
pf_remove_if_empty_ruleset(ruleset);
@@ -804,6 +828,7 @@ pf_commit_rules(u_int32_t ticket, char *anchor)
 {
struct pf_ruleset   *rs;
struct pf_rule  *rule, **old_array;
+   size_t  old_array_sz;
struct pf_rulequeue *old_rules;
int  error;
u_int32_told_rcount;
@@ -827,13 +852,16 @@ pf_commit_rules(u_int32_t ticket, char *anchor)
old_rules = rs->rules.active.ptr;
old_rcount = rs->rules.active.rcount;
old_array = rs->rules.active.ptr_array;
+   old_array_sz = rs->rules.active.array_sz;
 
rs->rules.active.ptr = rs->rules.inactive.ptr;
rs->rules.active.ptr_array = rs->rules.inactive.ptr_array;
rs->rules.active.rcount = rs->rules.inactive.rcount;
+   rs->rules.active.array_sz = rs->rules.inactive.array_sz;
rs->rules.inactive.ptr = old_rules;
rs->rules.inactive.ptr_array = old_array;
rs->rules.inactive.rcount = old_rcount;
+   rs->rules.inactive.array_sz = old_array_sz;
 
rs->rules.active.ticket = rs->rules.inactive.ticket;
pf_calc_skip_steps(rs->rules.active.ptr);
@@ -843,10 +871,12 @@ pf_commit_rules(u_int32_t ticket, char *anchor)
while ((rule = TAILQ_FIRST(old_rules)) != NULL)
pf_rm_rule(old_rules, rule);
if (rs->rules.inactive.ptr_array)
-   free(rs->rules.inactive.ptr_array, M_TEMP, 0);
+   free(rs->rules.inactive.ptr_array, M_TEMP,
+   rs->rules.inactive.array_sz);
rs->rules.inactive.ptr_array = NULL;
rs->rules.inactive.rcount = 0;
rs->rules.inactive.open = 0;
+   rs->rules.inactive.array_sz = 0;
pf_remove_if_empty_ruleset(rs);
 
/* queue defs only in the main ruleset */
@@ -864,10 +894,14 @@ pf_setup_pfsync_matching(struct pf_ruleset *rs)
 
MD5Init();
if (rs->rules.inactive.ptr_array)
-   free(rs->rules.inactive.ptr_array, M_TEMP, 0);
+   free(rs->rules.inactive.ptr_array, M_TEMP,
+   rs->rules.inactive.array_sz);
rs->rules.inactive.ptr_array = NULL;
+   rs->rules.inactive.array_sz = 0;
 
if (rs->rules.inactive.rcount) {
+   rs->rules.inactive.array_sz =
+   rs->rules.inactive.rcount * sizeof(caddr_t);
rs->rules.inactive.ptr_array =
mallocarray(rs->rules.inactive.rcount, sizeof(caddr_t),
M_TEMP, M_NOWAIT);
diff --git a/sys/net/pfvar.h b/sys/net/pfvar.h
index f06a1fa0e07..acdafcbc971 100644
--- a/sys/net/pfvar.h
+++ b/sys/net/pfvar.h
@@ -926,6 +926,7 @@ struct pf_ruleset {
struct pf_rule  **ptr_array;
u_int32_trcount;
u_int32_tticket;
+   size_t   

Re: pf: remove ptr_array from struct pf_ruleset

2020-07-14 Thread Alexandr Nedvedicky
Hello Klemens,



> `ptr_array' is allocated momentarily through mallocarray(9) and gets
> filled with the TAILQ entries, the sole user pfsync(4) can then access
> the list of rules by index to simply pick the n-th rule in a ruleset
> during state insertion.
> 
> I came here due to the zero size in its free(9) call, but the diff below
> removes the array completely and makes pfsync iterate over the TAILQ to
> get "index" of the matching rule in the ruleset.
> 
> I've been successfully running this on amd64 firewalls with pfsync(4)
> where I also added expiring `once' rules to further test with.
> 
> Feedback? OK?
> 

my only concern is performance impact. your diff trades O(1) for O(n) just
to handle expiration of PFRULE_ONCE correctly. However the extra costs here
might be bit reduced by fact, we do the look up just in case receive brand
new state from peer.

looking at the problem from different angle:

if there would be no PFRULE_ONCE rules, the bug won't exist

I vaguely remember the question of removing 'PFRULE_ONCE' rule is still
unanswered. It seems to me the PFRULE_ONCE is supposed to be used by
tftp-proxy only. However the code there is still protected by 'notyet'
ifdef guard.

if we would get your diff in, then IMHO the cost of keeping rarely used
PFRULE_ONCE feature, will further increase. I just don't know if the
new cost will still be acceptable, therefore I hesitate with my OK for
your change, which seems to look good otherwise.

thanks and
regards
sashan



Re: pf "route-to least-state" in an anchor doesn't work

2020-06-03 Thread Alexandr Nedvedicky
Hello Yasuoka,

I'm OK with your change.

However I would like to ask you to do yet another test.  I wonder if things
will eventually work on unfixed PF if rules will be constructed as follows:

pfctl -a test -t LB -T add 10.0.0.11@pair102

echo 'pass in on rdomain 102 quick proto tcp to 10.0.0.101 port 8080 \
keep state ( sloppy ) route-to  \
least-states sticky-address' |pfctl -a test -f -

echo 'anchor test' | pfctl -f -

pfctl -e

I suspect the bug you've found and fixed happens when pfctl loads rules
from pf.conf. I think the steps above will take a different route
through the code, which avoids pfr_ina_define() (a.k.a. transactions).

I don't have a test system readily available and I'm just curious
if anything changes or not. Thanks for finding that for me.

As I've said I think your change should go in.

OK sashan



Re: mcx(4) checksum offload

2020-05-18 Thread Alexandr Nedvedicky
Hello Jonathan,

I think it makes sense to turn it on on inbound path. we certainly
get a performance boost if HW verifies header checksums for us.

I'm not sure about outbound path. chksum offload on outbound side got killed
back in 2016 (I think). All that logic behind dealing with various HW did not
pay off. If PF is doing NAT it's far more simple and reliable if we compute
chksum in OS. 



> @@ -6343,6 +6657,7 @@ mcx_start(struct ifqueue *ifq)
>   sqe->sqe_signature = htobe32(MCX_SQE_CE_CQE_ALWAYS);
>  
>   /* eth segment */
> + sqe->sqe_mss_csum = htobe32(MCX_SQE_L3_CSUM | MCX_SQE_L4_CSUM);
>   sqe->sqe_inline_header_size = htobe16(MCX_SQ_INLINE_SIZE);
>   m_copydata(m, 0, MCX_SQ_INLINE_SIZE,
>   (caddr_t)sqe->sqe_inline_headers);
> 

I assume code above turns out HW chksums on outbound packets. I think
it does not matter much, because outbound packet currently arrives with
chksum calculated by OS anyway.

Also there might be some edge-cases, when we want to deliberately inject a
packet with broken header checksums to network. I'm afraid the HW would
fix such broken packets for us.

As I've said: I think it makes sense to enable HW offload on inbound path to
verify chksums. 

thanks and
regards
sashan



Re: pfctl_parser.c vs. __KAME

2020-05-03 Thread Alexandr Nedvedicky
Hello JCA,

thanks for the pointers.

> about this specific piece of code in pfctl, but the "kame hack" is still
> here and you really want to double check before removing such chunks in

...so pfctl is the wrong place to start with removal.

thanks and
regards
sashan



pfctl_parser.c vs. __KAME

2020-05-03 Thread Alexandr Nedvedicky
Hello,

the question has popped up while on recent code review of some Solaris specific
bug fixes: do we still need a code in diff below or is it OK to proceed
and commit the diff?

The chunk below uses bytes 2 and 3 to derive a scope of link local address.  It
seems to me those bytes/octets are always zero in IPv6 link scope addresses
these days, unless I'm missing something.  I was not able to identify its
counter part in current kernel, which would make octets 2 & 3 non-zero, so it
makes me thinking it should be OK to remove the whole chunk below.


thanks and
regards
sashan

note __KAME__ is defined, the chunk below is getting compiled currently.

8<---8<---8<--8<
diff --git a/sbin/pfctl/pfctl_parser.c b/sbin/pfctl/pfctl_parser.c
index cef0aa2474f..24175de046c 100644
--- a/sbin/pfctl/pfctl_parser.c
+++ b/sbin/pfctl/pfctl_parser.c
@@ -1361,21 +1361,6 @@ ifa_load(void)
err(1, "%s: calloc", __func__);
n->af = ifa->ifa_addr->sa_family;
n->ifa_flags = ifa->ifa_flags;
-#ifdef __KAME__
-   if (n->af == AF_INET6 &&
-   IN6_IS_ADDR_LINKLOCAL(&((struct sockaddr_in6 *)
-   ifa->ifa_addr)->sin6_addr) &&
-   ((struct sockaddr_in6 *)ifa->ifa_addr)->sin6_scope_id ==
-   0) {
-   struct sockaddr_in6 *sin6;
-
-   sin6 = (struct sockaddr_in6 *)ifa->ifa_addr;
-   sin6->sin6_scope_id = sin6->sin6_addr.s6_addr[2] << 8 |
-   sin6->sin6_addr.s6_addr[3];
-   sin6->sin6_addr.s6_addr[2] = 0;
-   sin6->sin6_addr.s6_addr[3] = 0;
-   }
-#endif
n->ifindex = 0;
if (n->af == AF_LINK)
n->ifindex = ((struct sockaddr_dl *)



Re: Simplify NET_LOCK() variations

2020-04-28 Thread Alexandr Nedvedicky
Hello,

> That's a bug, updated diff below.
> 
OK I see. the diff looks better then.

> If there's a consensus that this is a way to move forward, it would make
> sense to commit it after unlock. 
> 

I have not spot anything else. I think this change should go in.

OK sashan@



Re: Simplify NET_LOCK() variations

2020-04-16 Thread Alexandr Nedvedicky
Hello,

On Thu, Apr 16, 2020 at 10:57:42AM +0200, Martin Pieuchot wrote:
> On 14/04/20(Tue) 10:08, Martin Pieuchot wrote:
> > On 13/04/20(Mon) 03:20, Alexandr Nedvedicky wrote:
> > > On Sun, Apr 12, 2020 at 07:02:43PM +0200, Mark Kettenis wrote:
> > > > > From: "Theo de Raadt" 
> > > > > Date: Sun, 12 Apr 2020 10:28:59 -0600
> > > > > 
> > > > > > + if ((p->p_flag & P_SYSTEM) &&
> > > > > > + (strncmp(p->p_p->ps_comm, "softnet", 7) == 0))
> > > > > 
> > > > > Wow that is ugly.  
> > > > 
> > > > A better approach might be to store a pointer to the softnet task's
> > > > struct proc in a global variable and check that.  That is what we do
> > > > for the pagedaemon for example.
> > 
> > Diff below implements that by introducing the in_taskq() function, any
> > comment on that approach?
> > 
> > [...] 
> > Diff below uses NET_RLOCK_IN_SOFTNET() with a corresponding KASSERT()
> > and converts all the places where tasks are enqueued on the "softnet"
> > taskq.
> 
> Do we consider this approach good enough to explicitly document the 2,5
> years old state of locking of the network stack?

having this ASSERT already would helped us to discover the bug reported
by Richard much faster and easier.
> 
> Is the in_taskq() approach overkilled or is it fine?  Or are the names
> good enough?

I like it more than my 'magic' flag idea I shared earlier.
I'm fine with names. They are explicit enough.

> 
> Could this change have helped avoid the previous mistake? 

I would say yes.
> 
> In other words, are you ok with it?

I'm OK with it. However I would like to know some more details
on the construct below:

> @@ -356,7 +367,17 @@ taskq_thread(void *xtq)
>  {
>   struct taskq *tq = xtq;
>   struct task work;
> - int last;
> + int last, i;
> +
> + mtx_enter(>tq_mtx);
> + for (i = 0; i < tq->tq_nthreads; i++) {
> + if (tq->tq_threads[i] == NULL) {
> + tq->tq_threads[i] = curproc;
> + break;
> + }
> + }
> + KASSERT(i < tq->tq_nthreads);
> + mtx_leave(>tq_mtx);
>  
>   if (ISSET(tq->tq_flags, TASKQ_MPSAFE))
>   KERNEL_UNLOCK();
> @@ -381,4 +402,17 @@ taskq_thread(void *xtq)
>   wakeup_one(>tq_running);
>  
>   kthread_exit(0);
> +}

sorry if my question sounds stupid. The snippet above indicates
the task queue threads stay around forever (once created), right?
I'm asking, because I see no place where we do 'tq_threads[i] = NULL'.

thanks and
regards
sashan



Re: Simplify NET_LOCK() variations

2020-04-12 Thread Alexandr Nedvedicky
Hallo,

On Sun, Apr 12, 2020 at 07:02:43PM +0200, Mark Kettenis wrote:
> > From: "Theo de Raadt" 
> > Date: Sun, 12 Apr 2020 10:28:59 -0600
> > 
> > > + if ((p->p_flag & P_SYSTEM) &&
> > > + (strncmp(p->p_p->ps_comm, "softnet", 7) == 0))
> > 
> > Wow that is ugly.  
> 
> A better approach might be to store a pointer to the softnet task's
> struct proc in a global variable and check that.  That is what we do
> for the pagedaemon for example.
> 

I'm not sure same thing would work for network task. Currently
there is a single instance of pagedaemon, however we hope to
have more network tasks running in parallel.

How about sticking a magic flag to proc and task? The idea is just
shown in diff below. Just idea I have not tried to compile it.

regards
sashan

8<---8<---8<--8<
diff --git a/sys/kern/kern_task.c b/sys/kern/kern_task.c
index 5ed4e3a7c39..0698041c3aa 100644
--- a/sys/kern/kern_task.c
+++ b/sys/kern/kern_task.c
@@ -53,6 +53,7 @@ struct taskq {
 #ifdef WITNESS
struct lock_object   tq_lock_object;
 #endif
+   int  tq_magic;
 };
 
 static const char taskq_sys_name[] = "systq";
@@ -129,6 +130,7 @@ taskq_create(const char *name, unsigned int nthreads, int 
ipl,
tq->tq_nthreads = nthreads;
tq->tq_name = name;
tq->tq_flags = flags;
+   tq->tq_magic = 0;
 
mtx_init_flags(>tq_mtx, ipl, name, 0);
TAILQ_INIT(>tq_worklist);
@@ -146,6 +148,18 @@ taskq_create(const char *name, unsigned int nthreads, int 
ipl,
return (tq);
 }
 
+struct taskq *
+taskq_create_magic(const char *name, unsigned int nthreads, int ipl,
+unsigned int flags, int magic)
+{
+   struct taskq *tq;
+
+   tq = taskq_create(nmae, nthreads, ipl, flags);
+   if (tq != NULL)
+   tq->tq_magic = magic;
+   return (tq);
+}
+
 void
 taskq_destroy(struct taskq *tq)
 {
@@ -364,8 +378,11 @@ taskq_thread(void *xtq)
WITNESS_CHECKORDER(>tq_lock_object, LOP_NEWORDER, NULL);
 
while (taskq_next_work(tq, )) {
+   struct proc *p = curproc;
WITNESS_LOCK(>tq_lock_object, 0);
+   p->p_magic = tq->tq_magic;
(*work.t_func)(work.t_arg);
+   p->p_magic = 0;
WITNESS_UNLOCK(>tq_lock_object, 0);
sched_pause(yield);
}
diff --git a/sys/net/if.c b/sys/net/if.c
index 176fcf849fd..442a7bcec9a 100644
--- a/sys/net/if.c
+++ b/sys/net/if.c
@@ -266,7 +266,8 @@ ifinit(void)
timeout_set(_tick_to, net_tick, _tick_to);
 
for (i = 0; i < NET_TASKQ; i++) {
-   nettqmp[i] = taskq_create("softnet", 1, IPL_NET, TASKQ_MPSAFE);
+   nettqmp[i] = taskq_create_magic("softnet", 1, IPL_NET,
+   TASKQ_MPSAFE, 0x0030fd7ed); /* SOFTNET task */
if (nettqmp[i] == NULL)
panic("unable to create network taskq %d", i);
}
diff --git a/sys/sys/proc.h b/sys/sys/proc.h
index a25cb52951d..51ac96da5cc 100644
--- a/sys/sys/proc.h
+++ b/sys/sys/proc.h
@@ -408,6 +408,7 @@ struct proc {
union sigval p_sigval;  /* For core dump/debugger XXX */
longp_sitrapno; /* For core dump/debugger XXX */
int p_sicode;   /* For core dump/debugger XXX */
+   int p_magic;/* Identifies process context (for asserts) */
 };
 
 /* Status values. */



'pfctl -L state.file' grabs state lock recursively

2020-04-03 Thread Alexandr Nedvedicky
Hello,

The issue has been found by Chris Cappuccio. The good news is it can not be
triggered by default. To trigger the bug one has to build kernel with
'WITH_PF_LOCK' option. PF_STATE_ENTER_WRITE(), which is no-op by default,
becomes operational, when WITH_PF_LOCK is defined.

the 'pfctl -L state.file' loads states from file to PF driver. It uses
pfsync_state_import(). The pfsync_state_import() takes care of state
table locking, thus pfioctl() (the caller) should not bother.

OK?

thanks and
regards
sashan

8<---8<---8<--8<
diff --git a/sys/net/pf_ioctl.c b/sys/net/pf_ioctl.c
index e1810afe156..8060d605ea9 100644
--- a/sys/net/pf_ioctl.c
+++ b/sys/net/pf_ioctl.c
@@ -1682,9 +1682,7 @@ pfioctl(dev_t dev, u_long cmd, caddr_t addr, int flags, 
struct proc *p)
break;
}
PF_LOCK();
-   PF_STATE_ENTER_WRITE();
error = pfsync_state_import(sp, PFSYNC_SI_IOCTL);
-   PF_STATE_EXIT_WRITE();
PF_UNLOCK();
break;
}



pf_state_key_link_reverse() needs atomic ops -- resending

2020-04-03 Thread Alexandr Nedvedicky
Hello,

my apologize to resend the same diff [1]. I'm not sure I got OK or not.
It can be the case the privately received OK got lost.

The change is required to allow multiple instances of pf_test() running
concurrently. Without this change in, PF trips 'KASSERT(sk->reverse == NULL);'

thanks and
regards
sashan


[1] 
http://openbsd-archive.7691.n7.nabble.com/pf-state-key-link-reverse-needs-atomic-ops-td367859.html

8<---8<---8<--8<
diff --git a/sys/net/pf.c b/sys/net/pf.c
index ebe339921fa..738759e556c 100644
--- a/sys/net/pf.c
+++ b/sys/net/pf.c
@@ -7398,11 +7398,20 @@ pf_inp_unlink(struct inpcb *inp)
 void
 pf_state_key_link_reverse(struct pf_state_key *sk, struct pf_state_key *skrev)
 {
-   /* Note that sk and skrev may be equal, then we refcount twice. */
-   KASSERT(sk->reverse == NULL);
-   KASSERT(skrev->reverse == NULL);
-   sk->reverse = pf_state_key_ref(skrev);
-   skrev->reverse = pf_state_key_ref(sk);
+   struct pf_state_key *old_reverse;
+
+   old_reverse = atomic_cas_ptr(>reverse, NULL, skrev);
+   if (old_reverse != NULL)
+   KASSERT(old_reverse == skrev);
+   else
+   pf_state_key_ref(skrev);
+
+
+   old_reverse = atomic_cas_ptr(>reverse, NULL, sk);
+   if (old_reverse != NULL)
+   KASSERT(old_reverse == sk);
+   else
+   pf_state_key_ref(sk);
 }
 
 #if NPFLOG > 0



change to pfsync(4) reduces a PF_LOCK scope

2020-03-16 Thread Alexandr Nedvedicky
Hello,

patch below is a fairly large change, which reduces a scope of PF_LOCK().
Whenever pf(4) needs to send state table update to its peer, it must grab
a PF_LOCK() to serialize access to internal pfsync(4) queues. The patch
below adds a mutex to every queue pfsync's queue, so PF_LOCK() is no longer
required for outbound updates.  The inbound updates still require PF_LOCK() (+
state lock) in order to safely insert a state received from remote peer.

Hrvoje has been testing the diff for some time and he could see some issues,
which seems to be fixed now. Time has come to kindly ask more people to test
the diff and report issues back (either tech@ or bugs@).

It's worth to test patch as-is, to make sure no issues are introduced to
default compile time settings.

Brave hearts may enable 'WITH_PF_LOCK' option like that:

8<---8<---8<--8<
--- a/sys/arch/amd64/conf/GENERIC.MP
+++ b/sys/arch/amd64/conf/GENERIC.MP
@@ -3,6 +3,7 @@
 include "arch/amd64/conf/GENERIC"
 
 option MULTIPROCESSOR
+option WITH_PF_LOCK
 #optionMP_LOCKDEBUG
 #optionWITNESS
8<---8<---8<--8<

The tweak above enables the changes. Also keep in mind the tweak above
is not enough to get parallel execution of PF. Few more small changes
around IP input is required to enable that. I'll send another diff as
a reply to this email later tomorrow.


thanks and
regards
sashan
8<---8<---8<--8<
diff --git a/sys/net/if_pfsync.c b/sys/net/if_pfsync.c
index 34dd370078a..c7b01bab9fa 100644
--- a/sys/net/if_pfsync.c
+++ b/sys/net/if_pfsync.c
@@ -210,9 +210,11 @@ struct pfsync_softc {
struct ipsc_template;
 
struct pf_state_queuesc_qs[PFSYNC_S_COUNT];
+   struct mutex sc_mtx[PFSYNC_S_COUNT];
size_t   sc_len;
 
struct pfsync_upd_reqs   sc_upd_req_list;
+   struct mutex sc_upd_req_mtx;
 
int  sc_initial_bulk;
int  sc_link_demoted;
@@ -220,6 +222,7 @@ struct pfsync_softc {
int  sc_defer;
struct pfsync_deferrals  sc_deferrals;
u_intsc_deferred;
+   struct mutex sc_deferrals_mtx;
 
void*sc_plus;
size_t   sc_pluslen;
@@ -234,6 +237,7 @@ struct pfsync_softc {
struct timeout   sc_bulk_tmo;
 
TAILQ_HEAD(, tdb)sc_tdb_q;
+   struct mutex sc_tdb_mtx;
 
struct task  sc_ltask;
struct task  sc_dtask;
@@ -241,6 +245,16 @@ struct pfsync_softc {
struct timeout   sc_tmo;
 };
 
+struct pfsync_snapshot {
+   struct pfsync_softc *sn_sc;
+   struct pf_state_queuesn_qs[PFSYNC_S_COUNT];
+   struct pfsync_upd_reqs   sn_upd_req_list;
+   TAILQ_HEAD(, tdb)sn_tdb_q;
+   size_t   sn_len;
+   void*sn_plus;
+   size_t   sn_pluslen;
+};
+
 struct pfsync_softc*pfsyncif = NULL;
 struct cpumem  *pfsynccounters;
 
@@ -276,6 +290,10 @@ void   pfsync_bulk_start(void);
 void   pfsync_bulk_status(u_int8_t);
 void   pfsync_bulk_update(void *);
 void   pfsync_bulk_fail(void *);
+
+void   pfsync_grab_snapshot(struct pfsync_snapshot *, struct pfsync_softc *);
+void   pfsync_drop_snapshot(struct pfsync_snapshot *);
+
 #ifdef WITH_PF_LOCK
 void   pfsync_send_dispatch(void *);
 void   pfsync_send_pkt(struct mbuf *);
@@ -307,6 +325,13 @@ pfsync_clone_create(struct if_clone *ifc, int unit)
struct pfsync_softc *sc;
struct ifnet *ifp;
int q;
+   static const char *mtx_names[] = {
+   "iack_mtx",
+   "upd_c_mtx",
+   "del_mtx",
+   "ins_mtx",
+   "upd_mtx",
+   "" };
 
if (unit != 0)
return (EINVAL);
@@ -314,18 +339,23 @@ pfsync_clone_create(struct if_clone *ifc, int unit)
pfsync_sync_ok = 1;
 
sc = malloc(sizeof(*pfsyncif), M_DEVBUF, M_WAITOK|M_ZERO);
-   for (q = 0; q < PFSYNC_S_COUNT; q++)
+   for (q = 0; q < PFSYNC_S_COUNT; q++) {
TAILQ_INIT(>sc_qs[q]);
+   mtx_init_flags(>sc_mtx[q], IPL_SOFTNET, mtx_names[q], 0);
+   }
 
pool_init(>sc_pool, PFSYNC_PLSIZE, 0, IPL_SOFTNET, 0, "pfsync",
NULL);
TAILQ_INIT(>sc_upd_req_list);
+   mtx_init(>sc_upd_req_mtx, IPL_SOFTNET);
TAILQ_INIT(>sc_deferrals);
+   mtx_init(>sc_deferrals_mtx, IPL_SOFTNET);
task_set(>sc_ltask, pfsync_syncdev_state, sc);
task_set(>sc_dtask, pfsync_ifdetach, sc);
sc->sc_deferred = 0;
 
TAILQ_INIT(>sc_tdb_q);
+   mtx_init(>sc_tdb_mtx, IPL_SOFTNET);
 
sc->sc_len = PFSYNC_MINPKT;
sc->sc_maxupdates = 128;

Re: sbin/pfctl: use TAILQ_CONCAT(3)

2020-01-27 Thread Alexandr Nedvedicky
Hello,

On Mon, Jan 27, 2020 at 08:54:19PM +0100, Björn Ketelaars wrote:
> Replace custom TAILQ concatenation loop by TAILQ_CONCAT(3).
> 
> Comments/OK?

looks good to me.

OK sashan@



Re: vlan: use SMR_SLIST instead of SRP to protect instance lists

2020-01-26 Thread Alexandr Nedvedicky
Hello,

> 
> I agree we should protect against that - if I add a sleep there, I can easily
> cause a panic with 'ifconfig vlan10 destroy & ifconfig vlan10 destroy'.
> I think that's a separate issue though, as the same window is present in
> the existing code.
> 

that's true. it's refcnt_finalize() in vlan_clone_destroy(), what's made it
somewhat more obvious.

> This fixes it, at least in my brief testing:
> 
> Index: if_vlan.c
> ===
> RCS file: /cvs/src/sys/net/if_vlan.c,v
> retrieving revision 1.202
> diff -u -p -u -p -r1.202 if_vlan.c
> --- if_vlan.c 7 Nov 2019 07:36:32 -   1.202
> +++ if_vlan.c 27 Jan 2020 03:47:24 -
> @@ -249,6 +249,10 @@ vlan_clone_destroy(struct ifnet *ifp)
>   struct vlan_softc *sc = ifp->if_softc;
>  
>   NET_LOCK();
> + if (sc->sc_dead != 0) {
> + NET_UNLOCK();
> + return (ENXIO);
> + }
>   sc->sc_dead = 1;
>  
>   if (ISSET(ifp->if_flags, IFF_RUNNING))
> 

yes, that's perfect. I'm OK with your change.

thanks and
regards
sashan



Re: vlan: use SMR_SLIST instead of SRP to protect instance lists

2020-01-26 Thread Alexandr Nedvedicky
Hello,


> > I'm not sure about your diff see below.
> 
> visa@ pointed out that PF_LOCK will introduce a potential sleep inside 
> if_vinput(),
> making this unsafe, so here's a less optimistic diff where only the vlan list
> traversal is inside the SMR read section, and vlan_softcs are still 
> refcounted.
> I probably need to rework some other unsent SRP->SMR diffs now too.

less optimistic variant looks better. Though I have one
question related to refcnt_finalize().



> @@ -257,6 +239,7 @@ vlan_clone_destroy(struct ifnet *ifp)
>  
>   ether_ifdetach(ifp);
>   if_detach(ifp);
> + smr_barrier();
>   refcnt_finalize(>sc_refcnt, "vlanrefs");

I think a protection should be added so there will be at most 1 thread
calling to refcnt_finalize(). I've noticed vlan_clone_destroy() does
->sc_dead = 1; couple line earlier under protection of NET_LOCK(). Would it
make sense to do if (->sc_dead == 1) first? We would just bail out if
->sc_dead will be set already, because there is vlan_clone_destroy() in
progress already.

Otherwise the change looks good.

thanks and
regards
sashan



Re: SMP friendly pfsync(4) ready for testing

2020-01-26 Thread Alexandr Nedvedicky
Hello,

here is a revised diff, which makes my change to play nicely
with witness. Hrvoje found a panic with stack below when
running my diff with witness enabled:

witness: acquiring duplicate lock of same type: ">sc_mtx[q]"
 1st >sc_mtx[q]
 2nd >sc_mtx[q]
Starting stack trace...
witness_checkorder(8058d8c8,9,0) at witness_checkorder+0x6ba
mtx_enter(8058d8b8) at mtx_enter+0x34
pfsync_grab_snapshot(800022478a78,8058d000) at
pfsync_grab_snapshot+0x3c
pfsync_sendout() at pfsync_sendout+0x8b
if_netisr(0) at if_netisr+0x175
taskq_thread(8002f080) at taskq_thread+0x67
end trace frame: 0x0, count: 251
End of stack trace.

The first iteration of my diff I've sent earlier requires small
change:

@@ -325,6 +325,13 @@ pfsync_clone_create(struct if_clone *ifc, int unit)
struct pfsync_softc *sc;
struct ifnet *ifp;
int q;
+   static const char *mtx_names[] = {
+   "iack_mtx",
+   "upd_c_mtx",
+   "del_mtx",
+   "ins_mtx",
+   "upd_mtx",
+   "" };
 
if (unit != 0)
return (EINVAL);
@@ -334,7 +341,7 @@ pfsync_clone_create(struct if_clone *ifc, int unit)
sc = malloc(sizeof(*pfsyncif), M_DEVBUF, M_WAITOK|M_ZERO);
for (q = 0; q < PFSYNC_S_COUNT; q++) {
TAILQ_INIT(>sc_qs[q]);
-   mtx_init(>sc_mtx[q], IPL_SOFTNET);
+   mtx_init_flags(>sc_mtx[q], IPL_SOFTNET, mtx_names[q], 
0);
}
 
pool_init(>sc_pool, PFSYNC_PLSIZE, 0, IPL_SOFTNET, 0, "pfsync",

the complete diff follows.

thanks and
regards
sashan

8<---8<---8<--8<
diff --git a/sys/net/if_pfsync.c b/sys/net/if_pfsync.c
index 34dd370078a..c7b01bab9fa 100644
--- a/sys/net/if_pfsync.c
+++ b/sys/net/if_pfsync.c
@@ -210,9 +210,11 @@ struct pfsync_softc {
struct ipsc_template;
 
struct pf_state_queuesc_qs[PFSYNC_S_COUNT];
+   struct mutex sc_mtx[PFSYNC_S_COUNT];
size_t   sc_len;
 
struct pfsync_upd_reqs   sc_upd_req_list;
+   struct mutex sc_upd_req_mtx;
 
int  sc_initial_bulk;
int  sc_link_demoted;
@@ -220,6 +222,7 @@ struct pfsync_softc {
int  sc_defer;
struct pfsync_deferrals  sc_deferrals;
u_intsc_deferred;
+   struct mutex sc_deferrals_mtx;
 
void*sc_plus;
size_t   sc_pluslen;
@@ -234,6 +237,7 @@ struct pfsync_softc {
struct timeout   sc_bulk_tmo;
 
TAILQ_HEAD(, tdb)sc_tdb_q;
+   struct mutex sc_tdb_mtx;
 
struct task  sc_ltask;
struct task  sc_dtask;
@@ -241,6 +245,16 @@ struct pfsync_softc {
struct timeout   sc_tmo;
 };
 
+struct pfsync_snapshot {
+   struct pfsync_softc *sn_sc;
+   struct pf_state_queuesn_qs[PFSYNC_S_COUNT];
+   struct pfsync_upd_reqs   sn_upd_req_list;
+   TAILQ_HEAD(, tdb)sn_tdb_q;
+   size_t   sn_len;
+   void*sn_plus;
+   size_t   sn_pluslen;
+};
+
 struct pfsync_softc*pfsyncif = NULL;
 struct cpumem  *pfsynccounters;
 
@@ -276,6 +290,10 @@ void   pfsync_bulk_start(void);
 void   pfsync_bulk_status(u_int8_t);
 void   pfsync_bulk_update(void *);
 void   pfsync_bulk_fail(void *);
+
+void   pfsync_grab_snapshot(struct pfsync_snapshot *, struct pfsync_softc *);
+void   pfsync_drop_snapshot(struct pfsync_snapshot *);
+
 #ifdef WITH_PF_LOCK
 void   pfsync_send_dispatch(void *);
 void   pfsync_send_pkt(struct mbuf *);
@@ -307,6 +325,13 @@ pfsync_clone_create(struct if_clone *ifc, int unit)
struct pfsync_softc *sc;
struct ifnet *ifp;
int q;
+   static const char *mtx_names[] = {
+   "iack_mtx",
+   "upd_c_mtx",
+   "del_mtx",
+   "ins_mtx",
+   "upd_mtx",
+   "" };
 
if (unit != 0)
return (EINVAL);
@@ -314,18 +339,23 @@ pfsync_clone_create(struct if_clone *ifc, int unit)
pfsync_sync_ok = 1;
 
sc = malloc(sizeof(*pfsyncif), M_DEVBUF, M_WAITOK|M_ZERO);
-   for (q = 0; q < PFSYNC_S_COUNT; q++)
+   for (q = 0; q < PFSYNC_S_COUNT; q++) {
TAILQ_INIT(>sc_qs[q]);
+   mtx_init_flags(>sc_mtx[q], IPL_SOFTNET, mtx_names[q], 0);
+   }
 
pool_init(>sc_pool, PFSYNC_PLSIZE, 0, IPL_SOFTNET, 0, "pfsync",
NULL);
TAILQ_INIT(>sc_upd_req_list);
+   mtx_init(>sc_upd_req_mtx, IPL_SOFTNET);
TAILQ_INIT(>sc_deferrals);
+   mtx_init(>sc_deferrals_mtx, 

Re: vlan: use SMR_SLIST instead of SRP to protect instance lists

2020-01-26 Thread Alexandr Nedvedicky
Hello,


On Sun, Jan 26, 2020 at 10:06:24AM +1100, Jonathan Matthew wrote:
> Converting vlan(4) to use SMR instead of SRP to protect the instance lists
> is pretty simple.  vlan_input() doesn't keep a reference to vlan_softcs 
> outside
> the read critical section, so we don't need to reference count them any more.
> After the smr_barrier() in vlan_clone_destroy, all pointers to the vlan that
> were obtained from the instance lists have been dropped, so it's safe to free 
> it.
> 
> ok?

I'm not sure about your diff see below.


> @@ -415,7 +393,8 @@ vlan_input(struct ifnet *ifp0, struct mb
>   tag = EVL_VLANOFTAG(m->m_pkthdr.ether_vtag);
>  
>   list = [TAG_HASH(tag)];
> - SRPL_FOREACH(sc, , list, sc_list) {
> + smr_read_enter();
> + SMR_SLIST_FOREACH(sc, list, sc_list) {
>   if (ifp0->if_index == sc->sc_ifidx0 && tag == sc->sc_tag &&
>   etype == sc->sc_type)
>   break;
> @@ -456,14 +435,14 @@ vlan_input(struct ifnet *ifp0, struct mb
>  
>   if_vinput(>sc_if, m);
^^^
here we call if_input() inside a read section. SMR_CALL(9)
manpage says;

smr_read_enter(), and exited with smr_read_leave().  These routines
never block.  Sleeping is not allowed within SMR read-side critical
section.

If you can grant if_vinput() won't sleep, then your diff is OK. But I'm not
sure about it. However I must admit I might be missing something here.

thanks and
regards
sashan



Re: pf: remove 'one shot rules'

2020-01-25 Thread Alexandr Nedvedicky
Hello,

mikeb@ and me were poking about same idea some time ago (?2016?). But the idea
never turned to diff. If I remember correct the only meaningful use case we
could come up with for once rules is [t]ftp-proxy. But neither one seems to use
once rules at all. I'm OK with removing 'once' rules. From my point of view it
adds too much complexity for little win.


I've found just one nit in your diff so far, see below.

thanks and
regards
sashan


> Index: sys/net/pfvar.h
> ===
> RCS file: /cvs/src/sys/net/pfvar.h,v
> retrieving revision 1.493
> diff -u -p -u -p -r1.493 pfvar.h
> --- sys/net/pfvar.h   17 Nov 2019 08:25:05 -  1.493
> +++ sys/net/pfvar.h   24 Jan 2020 06:12:19 -
> @@ -596,10 +596,6 @@ struct pf_rule {
>   u_int16_t   port;
>   u_int8_ttype;
>   }   divert;
> -
> - SLIST_ENTRY(pf_rule) gcle;
> - struct pf_ruleset   *ruleset;
> - time_t   exptime;
>  };
>  
>  /* rule flags */
> @@ -617,7 +613,6 @@ struct pf_rule {
>  #define PFRULE_IFBOUND   0x0001  /* if-bound */
>  #define PFRULE_STATESLOPPY   0x0002  /* sloppy state tracking */
>  #define PFRULE_PFLOW 0x0004
> -#define PFRULE_ONCE  0x0010  /* one shot rule */
>  #define PFRULE_AFTO  0x0020  /* af-to rule */
>  #define  PFRULE_EXPIRED  0x0040  /* one shot rule hit by 
> pkt */

PFRULE_EXPIRED, can be also removed.



Re: MSI-X & Interrupting CPU > 0

2020-01-23 Thread Alexandr Nedvedicky
Hello,


> 
>  3. How does interrupting multiple CPUs influence packet processing in
> the softnet thread?  Is any knowledge required (CPU affinity?) to
> have an optimum processing when multiple softnet threads are used?
> 

I think it's hard to tell in advance. IMO we should try to make RX
running in parallel to create some pressure on code we already have.
We can get back to CPU affinity once we will have an experience
with code on multiple cores. Perhaps we should also slowly move from
thinking in terms of RX/forward/local/TX to more general pattern
I/O scheduling. I think we are not there yet.

I think we need experiments to gain the knowledge.

regards
sashan



Re: SMP friendly pfsync(4) ready for testing

2020-01-22 Thread Alexandr Nedvedicky
Hello,

sorry for typo in my earlier mail.

>   cd sys/arch/`uname -p`/conf
>   echo 'option WIT_PF_LOCK' >> GENERIC.MP
>   config GENERIC.MP
>   cd ../compile/GENERIC.MP/
>   make clean
>   make

do s/WIT_PF_LOCK/WITH_PF_LOCK in the snippet above.


my apologize.

thanks and
regards
sashan



SMP friendly pfsync(4) ready for testing

2020-01-21 Thread Alexandr Nedvedicky
Hello,

below is a diff, which makes pfsync(4) more pf_lock friendly. The diff is
sitting in my tree for some time. Hrvoje did some preliminary testing already.
He could trigger some sparks and smoke, which I could put off.  However don't
get too much excited about the change yet. At this phase I really need some
testing focused on spotting any undesired side effects. 

There should be no change in pf(4) + pfsync(4) behaviour in terms of
functionality or stability if diff is applied as-is. All SMP related changes in
pf/pfsync stuff are protected by WITH_PF_LOCK guard. I think it's still worth
to give a try and test diff below _with___out_ 'WITH_PF_LOCK' being set and
watching for changes in pfsync behavior.

Brave hearts can compile kernel with -DWITH_PF_LOCK. For example one can
something like this:
cd sys/arch/`uname -p`/conf
echo 'option WIT_PF_LOCK' >> GENERIC.MP
config GENERIC.MP
cd ../compile/GENERIC.MP/
make clean
make
The kernel, which is compiled with -DWITH_PF_LOCK option will execute a new
pf_lock friendly code in pfsync. But don't expect any performance improvement
yet. Remember there is no concurrency yet, which would run more instances of
pf_test() function.

If testing won't reveal any problems I'll re-post the diff asking for OKs.

The diff just extends current pfsync implementation so pfsync_update_state()
function can run concurrently.

Every state queue maintained by pfsync gets its own access to synchronize
insert/remove operations.

The send operation (pfsync_sendout()) is split to two steps:
the first stop is invoked synchronously on behalf of caller from pf
or periodic task. Here we grab mutex to grab all states from global
queues to local data (a snapshot). We release mutex once we have
snapshot.

The next step moves data from snapshot to pfsync packet. Once this
is done, we dispatch packet to task, which pushes packet to network.

The changes and reception side are not significant, we just teach pfsync
to grab PF_LOCK() and state rw-lock to insert states received from peer.


feedback and testing is much appreciated.

thanks and
regards
sashan


8<---8<---8<--8<
diff --git a/sys/net/if_pfsync.c b/sys/net/if_pfsync.c
index 34dd370078a..e0508623b97 100644
--- a/sys/net/if_pfsync.c
+++ b/sys/net/if_pfsync.c
@@ -210,9 +210,11 @@ struct pfsync_softc {
struct ipsc_template;
 
struct pf_state_queuesc_qs[PFSYNC_S_COUNT];
+   struct mutex sc_mtx[PFSYNC_S_COUNT];
size_t   sc_len;
 
struct pfsync_upd_reqs   sc_upd_req_list;
+   struct mutex sc_upd_req_mtx;
 
int  sc_initial_bulk;
int  sc_link_demoted;
@@ -220,6 +222,7 @@ struct pfsync_softc {
int  sc_defer;
struct pfsync_deferrals  sc_deferrals;
u_intsc_deferred;
+   struct mutex sc_deferrals_mtx;
 
void*sc_plus;
size_t   sc_pluslen;
@@ -234,6 +237,7 @@ struct pfsync_softc {
struct timeout   sc_bulk_tmo;
 
TAILQ_HEAD(, tdb)sc_tdb_q;
+   struct mutex sc_tdb_mtx;
 
struct task  sc_ltask;
struct task  sc_dtask;
@@ -241,6 +245,16 @@ struct pfsync_softc {
struct timeout   sc_tmo;
 };
 
+struct pfsync_snapshot {
+   struct pfsync_softc *sn_sc;
+   struct pf_state_queuesn_qs[PFSYNC_S_COUNT];
+   struct pfsync_upd_reqs   sn_upd_req_list;
+   TAILQ_HEAD(, tdb)sn_tdb_q;
+   size_t   sn_len;
+   void*sn_plus;
+   size_t   sn_pluslen;
+};
+
 struct pfsync_softc*pfsyncif = NULL;
 struct cpumem  *pfsynccounters;
 
@@ -276,6 +290,10 @@ void   pfsync_bulk_start(void);
 void   pfsync_bulk_status(u_int8_t);
 void   pfsync_bulk_update(void *);
 void   pfsync_bulk_fail(void *);
+
+void   pfsync_grab_snapshot(struct pfsync_snapshot *, struct pfsync_softc *);
+void   pfsync_drop_snapshot(struct pfsync_snapshot *);
+
 #ifdef WITH_PF_LOCK
 void   pfsync_send_dispatch(void *);
 void   pfsync_send_pkt(struct mbuf *);
@@ -314,18 +332,23 @@ pfsync_clone_create(struct if_clone *ifc, int unit)
pfsync_sync_ok = 1;
 
sc = malloc(sizeof(*pfsyncif), M_DEVBUF, M_WAITOK|M_ZERO);
-   for (q = 0; q < PFSYNC_S_COUNT; q++)
+   for (q = 0; q < PFSYNC_S_COUNT; q++) {
TAILQ_INIT(>sc_qs[q]);
+   mtx_init(>sc_mtx[q], IPL_SOFTNET);
+   }
 
pool_init(>sc_pool, PFSYNC_PLSIZE, 0, IPL_SOFTNET, 0, "pfsync",
NULL);
TAILQ_INIT(>sc_upd_req_list);
+   mtx_init(>sc_upd_req_mtx, IPL_SOFTNET);
TAILQ_INIT(>sc_deferrals);
+   mtx_init(>sc_deferrals_mtx, IPL_SOFTNET);

Re: Unbreak X:Y user/group spec in pf.conf

2020-01-16 Thread Alexandr Nedvedicky
Hello,



> > +.Pp
> > +Note that users 1000 and 1500 are excluded from the pass rule.
> 
> The last line above is a little hard to parse - I think a "positive
> example" would be clearer, i.e. something like this:
> 
> .Pp
> The example below permits users with uid between 1000 and 1500
> to open connections:
> .Bd -literal -offset indent
> block out proto tcp all
> pass  out proto tcp from self user { 999 >< 1501 }
> .Ed
> .Pp
> The
> .Sq \&:
> operator, which works for port number matching, does not work for
> [...]
> 

I like your suggestion, diff below fixes extra white space and
uses Stuart's wording.

thanks and
regards
sashan

8<---8<---8<--8<
diff --git a/share/man/man5/pf.conf.5 b/share/man/man5/pf.conf.5
index 452a15d1cfd..f847aa7fe32 100644
--- a/share/man/man5/pf.conf.5
+++ b/share/man/man5/pf.conf.5
@@ -820,6 +820,21 @@ connections:
 block out proto tcp all
 pass  out proto tcp from self user { < 1000, dhartmei }
 .Ed
+.Pp
+The example below permits users with uid between 1000 and 1500
+to open connections:
+.Bd -literal -offset indent
+block out proto tcp all
+pass  out proto tcp from self user { 999 >< 1501 }
+.Ed
+.Pp
+The
+.Sq \&:
+operator, which works for port number matching, does not work for
+.Cm user
+and
+.Cm group
+match.
 .El
 .Ss Translation
 Translation options modify either the source or destination address and



Re: Unbreak X:Y user/group spec in pf.conf

2020-01-16 Thread Alexandr Nedvedicky
Hello,


> > +of uids, which match the pass rule. The 
> New sentences on its own line.  I'd say
> 
>   Note that users 1000 and 1500 are excluded from the pass rule.
> 

yes, new sentence on the new line. and your wording sounds better.

> > +.Cm :
> The port paragraph marks up those operators with Sq (single quotes),
> we should be consistent here.  Cm for user and group is correct, though.

fixed.

updated manpage is below.

thanks and
regards
sashan

8<---8<---8<--8<
diff --git a/share/man/man5/pf.conf.5 b/share/man/man5/pf.conf.5
index 452a15d1cfd..fe99dc0c726 100644
--- a/share/man/man5/pf.conf.5
+++ b/share/man/man5/pf.conf.5
@@ -820,6 +820,22 @@ connections:
 block out proto tcp all
 pass  out proto tcp from self user { < 1000, dhartmei }
 .Ed
+.Pp
+The example below specifies a range of users to open outgoing
+connections:
+.Bd -literal -offset indent
+block out proto tcp all
+pass  out proto tcp from self user { 1000 >< 1500 }
+.Ed
+.Pp
+Note that users 1000 and 1500 are excluded from the pass rule.
+The 
+.Sq \&:
+operator, which works for port number matching, does not work for
+.Cm user
+and
+.Cm group
+match.
 .El
 .Ss Translation
 Translation options modify either the source or destination address and



Re: Unbreak X:Y user/group spec in pf.conf

2020-01-16 Thread Alexandr Nedvedicky
Hello,

> 
> (2Theo: yes, I'm lazy, sorry :) )
> 
> I agree, that "X:Y" syntax for "user" could be confusing, and "X> simply ugly. I do not have a silver bullet here, though.
> 
> If you oppose the proposed change, I'll add "... except 'uid1:uid2' syntax,
> which could be mistakenly interpreted as 'uid:gid'" to pf.conf(5). Will be
> that okay?


I think that's where we are heading after reading email from sthen@

Let's focus on to update pf.conf.5 manpage. Would diff below make pf.conf.5
manpage more useful?

thanks and
regards
sashan

8<---8<---8<--8<
diff --git a/share/man/man5/pf.conf.5 b/share/man/man5/pf.conf.5
index 452a15d1cfd..42c3c3466da 100644
--- a/share/man/man5/pf.conf.5
+++ b/share/man/man5/pf.conf.5
@@ -820,6 +820,22 @@ connections:
 block out proto tcp all
 pass  out proto tcp from self user { < 1000, dhartmei }
 .Ed
+.Pp
+The example below specifies a range of users to open outgoing
+connections:
+.Bd -literal -offset indent
+block out proto tcp all
+pass  out proto tcp from self user { 1000 >< 1500 }
+.Ed
+.Pp
+Note the range above excludes 1000 and 1500 uids from list
+of uids, which match the pass rule. The 
+.Cm :
+operator, which works for port number matching, does not work for
+.Cm user
+and
+.Cm group
+match.
 .El
 .Ss Translation
 Translation options modify either the source or destination address and



Re: Unbreak X:Y user/group spec in pf.conf

2020-01-16 Thread Alexandr Nedvedicky
Hello,


> 
> > Looks like Vadim found a bug and I'll take a look at the patch
> > he has sent.
> Where do you see a bug?
> 

at description of 'user' match the pf.conf(5) reads as follows:

 User and group IDs can be specified as either numbers or names.
 The syntax is similar to the one for ports.  The following
 example allows only selected users to open outgoing connections:

   block out proto tcp all
   pass  out proto tcp from self user { < 1000, dhartmei }

sentence 'The syntax is similar to the one for ports' sets my expectations
I can define a range of users in the same way I define a range of ports.
Looks useful to me, though a bug in parse.y might be just a tip of iceberg
here.

regards
sashan



Re: Unbreak X:Y user/group spec in pf.conf

2020-01-16 Thread Alexandr Nedvedicky
Hello,

just to clarify the user and group match in pf.conf

On Wed, Jan 15, 2020 at 11:14:43PM -0700, Theo de Raadt wrote:
> I'll bite, using text from your regress.
> 
> > +pass out proto tcp all user 1234:12345 flags S/SA
> > +pass out proto tcp all user 0:12345 flags S/SA
> > +pass out proto tcp all group 1234:12345 flags S/SA
> > +pass out proto tcp all group 0:12345 flags S/SA
> 
> What does 1234:12345 mean.  It must be uid 1234 _and_ gid 12345?

according to my understanding 'user 1234:12345' matches
_all_ user IDs in range, which starts with 1234 and ends 12345.
The ranges are explained at paragraph, which discusses ports: 

 Ports and ranges of ports are specified using these operators:

   =   (equal)
   !=  (unequal)
   <   (less than)
   <=  (less than or equal)
   >   (greater than)
   >=  (greater than or equal)
   :   (range including boundaries)
   ><  (range excluding boundaries)
   <>  (except range)

to define the precise check (must be uid 1234 _and_ gid 12345),
one has to go to something like this:

pass out proto tcp all group 12345 user 1234
 
Looks like Vadim found a bug and I'll take a look at the patch
he has sent.

regards
sashan



Re: pfctl: Fail on missing anchor

2020-01-15 Thread Alexandr Nedvedicky
Hello,


On Thu, Jan 16, 2020 at 12:15:39AM +0100, Klemens Nanni wrote:
> There is no reason to continue on anchor specific paths if the given
> anchor does not exist, so fix the last occurences better error messages:
> 
>   # pfctl -a regress\* -Fa
>   Anchor 'regress' not found.
>   pfctl: pfctl_get_anchors failed to retrieve list of anchors, can't 
> continue
>   # ./obj/pfctl -a regress\* -Fa
>   pfctl: Anchor does not exist
> 
> Feedback? OK?
> 

looks OK to me.

thanks and
regards
sashan



Re: pfctl: unify error message for nonexisting anchors

2020-01-15 Thread Alexandr Nedvedicky
Hello,



> There are other occasions as well but those probably need additional
> tweaks, so here's the first round.
> 
> Feedback? OK?

I like the idea. Just have one 'bike shedding' suggestion:

rename pfr_strerror() to pf_strerror() and move it from pfctl_radix.c
to pfctl.c as it becomes more generic now.

You have my OK. 

thanks and
regards
sashan



Re: pfctl: Refine error message

2020-01-15 Thread Alexandr Nedvedicky
On Wed, Jan 15, 2020 at 04:44:55PM +0100, Klemens Nanni wrote:
> While code in pf/pfctl confusingly uses either anchor or ruleset
> depending on the context, pfctl(8) (both manual and user interface)
> should be consistent.  There should be no difference between anchor and
> ruleset for users, implying there is one only makes for confusion:
> 
>   # pfctl -a regress -Fa
>   pfctl: Anchor or Ruleset does not exist.
> 
> The synopsis itself is `-a anchor' and with the other diff this error
> message format should be well in line with how we use err(3) across the
> tree.
> 
>   # ./obj/pfctl -a regress -Fa
>   pfctl: Anchor does not exist
> 
> OK?

makes sense.

OK sashan



Re: pfctl: Merge radix_perror() into simpler warnx()/errx() usage

2020-01-15 Thread Alexandr Nedvedicky
On Wed, Jan 15, 2020 at 04:26:20PM +0100, Klemens Nanni wrote:
> Less nesting for clearer code.
> 
> OK?
> 
> The macro backslashes got adjusted but diff with `-w' for ease of review.
> 

OK sashan



Re: pfctl - allow recursive flush operations [ 5/5 ]

2020-01-14 Thread Alexandr Nedvedicky
Hello


> > > 
> > > This reads simpler and clearer to me, what do you think?
> > 
> > OK, I'll buy if (...) from you, but '+ 1' must stay there,
> > because it is for a '\0' terminator. So let's go for this:
> > len = strlen(pr->name) + 1;
> > if (pr->path[0])
> > len += strlen(pr->path) + 1;
> 
> I think this should use asprintf() instead.
> 

and you are right.

thanks and
regards
sashan

8<---8<---8<--8<
diff --git a/sbin/pfctl/pfctl.c b/sbin/pfctl/pfctl.c
index 9c5778f4f97..16251d2e289 100644
--- a/sbin/pfctl/pfctl.c
+++ b/sbin/pfctl/pfctl.c
@@ -2173,7 +2173,6 @@ int
 pfctl_walk_get(int opts, struct pfioc_ruleset *pr, void *warg)
 {
struct pfr_anchoritem *pfra;
-   size_t len;
struct pfr_anchors  *anchors;
 
anchors = (struct pfr_anchors *) warg;
@@ -2182,19 +2181,15 @@ pfctl_walk_get(int opts, struct pfioc_ruleset *pr, void 
*warg)
if (pfra == NULL)
err(1, "%s", __func__);
 
-   len = strlen(pr->name) + 1;
+   pfra->pfra_anchorname = NULL;
if (pr->path[0])
-   len += strlen(pr->path) + 1;
+   asprintf(>pfra_anchorname, "%s/%s", pr->path, pr->name);
+   else
+   asprintf(>pfra_anchorname, "%s", pr->name);
 
-   pfra->pfra_anchorname = malloc(len);
if (pfra->pfra_anchorname == NULL)
err(1, "%s", __func__);
 
-   if (pr->path[0])
-   snprintf(pfra->pfra_anchorname, len, "%s/%s",
-   pr->path, pr->name);
-   else
-   snprintf(pfra->pfra_anchorname, len, "%s", pr->name);
 
SLIST_INSERT_HEAD(anchors, pfra, pfra_sle);
 



Re: pfctl - allow recursive flush operations [ 5/5 ]

2020-01-14 Thread Alexandr Nedvedicky
Hello,

On Tue, Jan 14, 2020 at 03:02:09PM +0100, Klemens Nanni wrote:
> On Mon, Jan 13, 2020 at 10:47:27PM +0100, Alexandr Nedvedicky wrote:
> > @@ -2182,7 +2182,7 @@ pfctl_walk_get(int opts, struct pfioc_ruleset *pr, 
> > void *warg)
> > if (pfra == NULL)
> > err(1, "%s", __func__);
> >  
> > -   len = strlen(pr->path) + 1;
> > +   len = (pr->path[0] == '\0') ? 0 : strlen(pr->path) + 1;
> > len += strlen(pr->name) + 1;
> pr->name is always used and only pr->path is optional.  With this diff
> you're always adding one with the name altough it is only required if
> path is given (for the "/" delimiter).
> 
>   len = strlen(pr->name);
>   if (pr->path[0])
>   len += strlen(pr->path) + 1;
> 
> This reads simpler and clearer to me, what do you think?

OK, I'll buy if (...) from you, but '+ 1' must stay there,
because it is for a '\0' terminator. So let's go for this:
len = strlen(pr->name) + 1;
if (pr->path[0])
len += strlen(pr->path) + 1;

> 
> > pfra->pfra_anchorname = malloc(len);
> > if (pfra->pfra_anchorname == NULL)
> > @@ -2312,7 +2312,8 @@ pfctl_recurse(int dev, int opts, const char 
> > *anchorname,
> >  
> > anchors = pfctl_get_anchors(dev, anchorname, opts);
> > /*
> > -* don't let pfctl_clear_*() function to fail with exit
> > +* don't let pfctl_clear_*() function to fail with exit, also make
> > +* pfctl_clear_tables(),(which is passed via walkf argument, quiet.
> Missing space after comma, extraneous parenthese as well.
> 
> >  */
> > opts |= PF_OPT_IGNFAIL | PF_OPT_QUIET;

my comment was misleading. I also want to make pfctl_clear_rules()
silent. But setting PF_OPT_QUIET flag in pfctl_call_() functions,
where we need it makes sense.

> 
> Below is a comment according to style(9) that seems fitting for IGNFAIL
> alone (leaving QUIET out now due to above);  it's more of a "why" rather
> than a "what" so the semantics behind IGNFAIL become clearer since it
> isn't used or documented anywhere else.
> 
>   /*
>* While traversing the list, pfctl_clear_*() must always return
>* so that failures on one anchor do not prevent clearing others.
>*/
>   opts |= PF_OPT_IGNFAIL;

comment reads far better.

diff below brings suggested changes to 5/5 patch.

thanks and
regards
sashan

8<---8<---8<--8<
diff --git a/sbin/pfctl/pfctl.c b/sbin/pfctl/pfctl.c
index 67dae4cdad9..9c5778f4f97 100644
--- a/sbin/pfctl/pfctl.c
+++ b/sbin/pfctl/pfctl.c
@@ -2182,8 +2182,10 @@ pfctl_walk_get(int opts, struct pfioc_ruleset *pr, void 
*warg)
if (pfra == NULL)
err(1, "%s", __func__);
 
-   len = (pr->path[0] == '\0') ? 0 : strlen(pr->path) + 1;
-   len += strlen(pr->name) + 1;
+   len = strlen(pr->name) + 1;
+   if (pr->path[0])
+   len += strlen(pr->path) + 1;
+
pfra->pfra_anchorname = malloc(len);
if (pfra->pfra_anchorname == NULL)
err(1, "%s", __func__);
@@ -2281,6 +2283,11 @@ pfctl_get_anchors(int dev, const char *anchor, int opts)
 int
 pfctl_call_cleartables(int dev, int opts, struct pfr_anchoritem *pfra)
 {
+   /*
+* PF_OPT_QUIET makes pfctl_clear_tables() to stop printing number of
+* tables cleared for given anchor.
+*/
+   opts |= PF_OPT_QUIET;
return ((pfctl_clear_tables(pfra->pfra_anchorname, opts) == -1) ?
1 : 0);
 }
@@ -2288,6 +2295,11 @@ pfctl_call_cleartables(int dev, int opts, struct 
pfr_anchoritem *pfra)
 int
 pfctl_call_clearrules(int dev, int opts, struct pfr_anchoritem *pfra)
 {
+   /*
+* PF_OPT_QUIET makes pfctl_clear_rules() to stop printing a 'rules
+* cleared' message for every anchor it deletes.
+*/
+   opts |= PF_OPT_QUIET;
return (pfctl_clear_rules(dev, opts, pfra->pfra_anchorname));
 }
 
@@ -2297,7 +2309,7 @@ pfctl_call_clearanchors(int dev, int opts, struct 
pfr_anchoritem *pfra)
int rv = 0;
 
rv |= pfctl_call_cleartables(dev, opts, pfra);
-   rv |= pfctl_clear_rules(dev, opts, pfra->pfra_anchorname);
+   rv |= pfctl_call_clearrules(dev, opts, pfra);
 
return (rv);
 }
@@ -2312,10 +2324,10 @@ pfctl_recurse(int dev, int opts, const char *anchorname,
 
anchors = pfctl_get_anchors(dev, anchorname, opts);
/*
-* don't let pfctl_clear_*() function to fail with exit, also make
-* pfctl_clear_tables(),(which is passed via walkf argument, quiet.
+* While traversing the list, pfctl_clear_*() must always return
+* so that failures on one anchor do not prevent clearing others.
 */
-   opts |= PF_OPT_IGNFAIL | PF_OPT_QUIET;
+   opts |= PF_OPT_IGNFAIL;
printf("Removing:\n");
SLIST_FOREACH_SAFE(pfra, anchors, pfra_sle, pfra_save) {
printf("  %s\n", (*pfra->pfra_anchorname == '\0') ?



Re: pfctl - allow recursive flush operations [ 5/5 ]

2020-01-14 Thread Alexandr Nedvedicky
Hello,


> > +   unsigned int len;
> strlen(3) returns size_t, malloc(3) takes it.

makes sense>

> 
> > +   struct pfr_anchors  *anchors;
> > +
> > +   anchors = (struct pfr_anchors *) warg;
> > +
> > +   pfra = malloc(sizeof(*pfra));
> > +   if (pfra == NULL)
> > +   err(1, "%s", __func__);
> > +
> > +   len = strlen(pr->path) + 1;
> > +   len += strlen(pr->name) + 1;
> > +   pfra->pfra_anchorname = malloc(len);
> > +   if (pfra->pfra_anchorname == NULL)
> > +   err(1, "%s", __func__);
> You're always allocating for path and name, but that is too much in the
> else branch.  I think this part can be improved.

you are right, I've opted for ternary operator when
calculating len for pr->path.

> 
> 
> > +int
> > +pfctl_recurse(int dev, int opts, const char *anchorname,
> > +int(*walkf)(int, int, struct pfr_anchoritem *))
> > +{
> > +   int  rv = 0;
> > +   struct pfr_anchors  *anchors;
> > +   struct pfr_anchoritem   *pfra, *pfra_save;
> > +
> > +   anchors = pfctl_get_anchors(dev, anchorname, opts);
> > +   /*
> > +* don't let pfctl_clear_*() function to fail with exit
> > +*/
> > +   opts |= PF_OPT_IGNFAIL | PF_OPT_QUIET;
> Why QUIET as well?

the pfctl_clear_tables() is not that silent by default.
we pass pfctl_clear_tables() via walkf argument. I've updated
the comment above to make explain purpose of PF_OPT_QUIET.

diff below updates patch 5/5.

thanks and
regards
sashan

8<---8<---8<--8<
diff --git a/sbin/pfctl/pfctl.c b/sbin/pfctl/pfctl.c
index cbf17d01b7d..67dae4cdad9 100644
--- a/sbin/pfctl/pfctl.c
+++ b/sbin/pfctl/pfctl.c
@@ -2173,7 +2173,7 @@ int
 pfctl_walk_get(int opts, struct pfioc_ruleset *pr, void *warg)
 {
struct pfr_anchoritem *pfra;
-   unsigned int len;
+   size_t len;
struct pfr_anchors  *anchors;
 
anchors = (struct pfr_anchors *) warg;
@@ -2182,7 +2182,7 @@ pfctl_walk_get(int opts, struct pfioc_ruleset *pr, void 
*warg)
if (pfra == NULL)
err(1, "%s", __func__);
 
-   len = strlen(pr->path) + 1;
+   len = (pr->path[0] == '\0') ? 0 : strlen(pr->path) + 1;
len += strlen(pr->name) + 1;
pfra->pfra_anchorname = malloc(len);
if (pfra->pfra_anchorname == NULL)
@@ -2312,7 +2312,8 @@ pfctl_recurse(int dev, int opts, const char *anchorname,
 
anchors = pfctl_get_anchors(dev, anchorname, opts);
/*
-* don't let pfctl_clear_*() function to fail with exit
+* don't let pfctl_clear_*() function to fail with exit, also make
+* pfctl_clear_tables(),(which is passed via walkf argument, quiet.
 */
opts |= PF_OPT_IGNFAIL | PF_OPT_QUIET;
printf("Removing:\n");



tripping assert in pf_state_key_link_reverse()

2020-01-14 Thread Alexandr Nedvedicky
Hello,

Some time ago I've tripped over ASSERT() in pf_state_key_link_reverse(), when
testing my changes to pfsync. I was using HP 710 systems with 8 cores I got
from Hrvoje. The panic happened rarely when running performance tesst over 10Gb
interfaces.

At time of panic the firewall was running with 'pass all' rule only.  The rule
creates state for inbound and outbound packets. At time of panic my box was
was running a custom kernel (compiled with WITH_PF_LOCK option and multiple
input net tasks).  My explanation how I could trip the assert is as follows:

client sends SYN packet, which creates two states.

now assume the server delays a response, so the response
crosses with retransmitted SYN in firewall. In this case
PF will be running two instances of pf_state_key_link_reverse().
In this case there is one winner and one looser, which trips assert.

the fix in patch below is straightforward. Use atomic ops to link keys, while
keeping same asserts in place, when either atomic op fails.

OK?

thanks and
regards
sashan

8<---8<---8<--8<
diff --git a/sys/net/pf.c b/sys/net/pf.c
index ebe339921fa..738759e556c 100644
--- a/sys/net/pf.c
+++ b/sys/net/pf.c
@@ -7398,11 +7398,20 @@ pf_inp_unlink(struct inpcb *inp)
 void
 pf_state_key_link_reverse(struct pf_state_key *sk, struct pf_state_key *skrev)
 {
-   /* Note that sk and skrev may be equal, then we refcount twice. */
-   KASSERT(sk->reverse == NULL);
-   KASSERT(skrev->reverse == NULL);
-   sk->reverse = pf_state_key_ref(skrev);
-   skrev->reverse = pf_state_key_ref(sk);
+   struct pf_state_key *old_reverse;
+
+   old_reverse = atomic_cas_ptr(>reverse, NULL, skrev);
+   if (old_reverse != NULL)
+   KASSERT(old_reverse == skrev);
+   else
+   pf_state_key_ref(skrev);
+
+
+   old_reverse = atomic_cas_ptr(>reverse, NULL, sk);
+   if (old_reverse != NULL)
+   KASSERT(old_reverse == sk);
+   else
+   pf_state_key_ref(sk);
 }
 
 #if NPFLOG > 0



Re: pfctl - allow recursive flush operations [ 3/5 ]

2020-01-13 Thread Alexandr Nedvedicky
Hello,


> > @@ -2109,7 +2112,20 @@ pfctl_debug(int dev, u_int32_t level, int opts)
> >  }
> >  
> >  int
> > -pfctl_show_anchors(int dev, int opts, char *anchorname)
> > +pfctl_walk_show(int opts, struct pfioc_ruleset *pr, void *warg)
> > +{
> > +   if (pr->path[0]) {
> > +   if (pr->path[0] != '_' || (opts & PF_OPT_VERBOSE))
> The latter is always true because you only call this function inside
> 
>   if (opts & PF_OPT_VERBOSE) {
>   ...
>   }
> 

we should keep this test here in place, and fix pfctl_walk_show()
caller (see further below).

> > +   printf("  %s/%s\n", pr->path, pr->name);
> > +   } else if (pr->name[0] != '_' || (opts & PF_OPT_VERBOSE))
> > +   printf("  %s\n", pr->name);
> > +
> > +   return (0);
> Do you have plans for other callback functions in the future?  If so,
> returning `int' makes sense if those might possibly fail.

sure. The plans materialize in diff 4/5, which introduces pfctl_call_...()
functions. I really need return value.



> > +pfctl_walk_anchors(int dev, int opts, char *anchorname, void *warg,
> > +int(walkf)(int, struct pfioc_ruleset *, void *))
> "warg" and "walkf" reads weird:  Naturally function comes before
> argument(s) but then you're also abbreviating them inconsistently.
> How aobut "walkfunc" and "walkarg" in that order?
> 

makes sense. I'll fix it in diff 5/5, which I want to commit.


> > +   if (opts & PF_OPT_VERBOSE) {
> > +   charsub[PATH_MAX];
> Although the same, please use MAXPATHLEN since that is used to define
> PF_ANCHOR_MAXPATH in pfvar.h.

I see. Can we leave it for follow up change? PATH_MAX is currently
consistent with what's being used in PF.

> 
> > +
> > +   if (pr.path[0])
> > +   snprintf(sub, sizeof(sub), "%s/%s",
> > +   pr.path, pr.name);
> > +   else
> > +   snprintf(sub, sizeof(sub), "%s",
> > +   pr.name);
> > +   if (pfctl_walk_anchors(dev, opts, sub, warg, walkf))
> > +   return (-1);
> I'm probbaly missing something but why does recursion depend on
> PF_OPT_VERBOSE?  This also omits printing of internal anchors (prefixed
> with "_") unless PF_OPT_VERBOSE is set, whereas the previous code did
> this regardless of verbosity.
> 

the code is odd. It might be a result of copy'n'paste/fat finger.
I've delete the code in 5/5 diff.


> > +pfctl_show_anchors(int dev, int opts, char *anchorname)
> I know it's named "anchorname" already and this is nitpicking, but we're
> passing the full anchor path and name here, so both "anchorname" and
> "anchorpath" suggest either of the two... components;  how about just
> "anchor" like in `pfctl -a anchor'?  Then I don't have to double check
> whether this parameters actually contains the path, name or both.

yes, I agree. I'll rename it in 5/5 diff.

> 
> > +{
> > +   int rv;
> > +
> > +   rv = pfctl_walk_anchors(dev, opts, anchorname, NULL, pfctl_walk_show);
> > +   return (rv);
> How about simply
> 
>   return (pfctl_walk_anchors(dev, opts, anchorname, NULL,
>   pfctl_walk_show));
> 

yes, final 5/5 diff already contains the change you propose here.
As I've said earlier the partial diffs are just code review. Those
were created artificially, when I tried to split big change to
smaller changes, just to show the evolution of idea.


thanks and
regards
sashan



Re: pfctl - allow recursive flush operations [ 1/5 ]

2020-01-13 Thread Alexandr Nedvedicky
Hello,

this is the last piece, which makes everything fit together. The
change introduces three 'nuke()' functions:
- pfctl_call_clearrules()
- pfctl_call_cleartables()
- pfctl_call_clearanchors()

Those are callbacks for pfctl_recurse() we've introduced earlier. Those
functions just call existing pfctl_clear_...() function. We also must turn
pfctl_clear_...() function from void to int so error can be reported.  Also all
pfctl_clear_...() function must adhere a PF_OPT_IGNFAIL flag now.

Not diff below also includes a change, which does a right thing
in case we ask pfctl to remove just subtree:

pfctl -a 'foo/bar/*' -Fr

However keep in mind all those changes are uncovering glitches,
which exit in PF kernel driver already.

thanks and
regards
sashan

8<---8<---8<--8<

diff --git a/sbin/pfctl/pfctl.c b/sbin/pfctl/pfctl.c
index 054da006e69..9fe661e8c89 100644
--- a/sbin/pfctl/pfctl.c
+++ b/sbin/pfctl/pfctl.c
@@ -51,10 +51,9 @@
 #include 
 #include 
 #include 
-
 #include 
-
 #include 
+#include 
 
 #include "pfctl_parser.h"
 #include "pfctl.h"
@@ -65,7 +64,7 @@ intpfctl_disable(int, int);
 voidpfctl_clear_queues(struct pf_qihead *);
 voidpfctl_clear_stats(int, const char *, int);
 voidpfctl_clear_interface_flags(int, int);
-voidpfctl_clear_rules(int, int, char *);
+int pfctl_clear_rules(int, int, char *);
 voidpfctl_clear_src_nodes(int, int);
 voidpfctl_clear_states(int, const char *, int);
 struct addrinfo *
@@ -110,11 +109,15 @@ void  pfctl_state_load(int, const char *);
 void   pfctl_reset(int, int);
 intpfctl_walk_show(int, struct pfioc_ruleset *, void *);
 intpfctl_walk_get(int, struct pfioc_ruleset *, void *);
-intpfctl_walk_anchors(int, int, char *, void *,
+intpfctl_walk_anchors(int, int, const char *, void *,
 int(*)(int, struct pfioc_ruleset *, void *));
 struct pfr_anchors *
-   pfctl_get_anchors(int, int);
-intpfctl_recurse(int, int, int(*nuke)(int, int, struct pfr_anchoritem *));
+   pfctl_get_anchors(int, const char *, int);
+intpfctl_recurse(int, int, const char *,
+   int(*)(int, int, struct pfr_anchoritem *));
+intpfctl_call_clearrules(int, int, struct pfr_anchoritem *);
+intpfctl_call_cleartables(int, int, struct pfr_anchoritem *);
+intpfctl_call_clearanchors(int, int, struct pfr_anchoritem *);
 
 const char *clearopt;
 char   *rulesopt;
@@ -244,7 +247,6 @@ static const char *optiopt_list[] = {
 struct pf_qihead qspecs = TAILQ_HEAD_INITIALIZER(qspecs);
 struct pf_qihead rootqs = TAILQ_HEAD_INITIALIZER(rootqs);
 
-
 __dead void
 usage(void)
 {
@@ -361,11 +363,10 @@ pfctl_clear_interface_flags(int dev, int opts)
}
 }
 
-void
+int
 pfctl_clear_rules(int dev, int opts, char *anchorname)
 {
-   struct pfr_buffer t;
-   int rv = 0;
+   struct pfr_buffer   t;
 
memset(, 0, sizeof(t));
t.pfrb_type = PFRB_TRANS;
@@ -373,9 +374,11 @@ pfctl_clear_rules(int dev, int opts, char *anchorname)
pfctl_trans(dev, , DIOCXBEGIN, 0) ||
pfctl_trans(dev, , DIOCXCOMMIT, 0)) {
pfctl_err(opts, 1, "%s", __func__);
-   rv = 1;
+   return (1);
} else if ((opts & PF_OPT_QUIET) == 0)
fprintf(stderr, "rules cleared\n");
+
+   return (0);
 }
 
 void
@@ -2171,9 +2174,9 @@ pfctl_walk_get(int opts, struct pfioc_ruleset *pr, void 
*warg)
 {
struct pfr_anchoritem *pfra;
unsigned int len;
-   struct {
-   struct pfr_anchoritem *pfra;
-   } *tail_pfra = warg;
+   struct pfr_anchors  *anchors;
+
+   anchors = (struct pfr_anchors *) warg;
 
pfra = malloc(sizeof(*pfra));
if (pfra == NULL)
@@ -2191,16 +2194,13 @@ pfctl_walk_get(int opts, struct pfioc_ruleset *pr, void 
*warg)
else
snprintf(pfra->pfra_anchorname, len, "%s", pr->name);
 
-   if (tail_pfra->pfra != NULL)
-   SLIST_INSERT_AFTER(tail_pfra->pfra, pfra, pfra_sle);
-
-   tail_pfra->pfra = pfra;
+   SLIST_INSERT_HEAD(anchors, pfra, pfra_sle);
 
return (0);
 }
 
 int
-pfctl_walk_anchors(int dev, int opts, char *anchorname, void *warg,
+pfctl_walk_anchors(int dev, int opts, const char *anchorname, void *warg,
 int(walkf)(int, struct pfioc_ruleset *, void *))
 {
struct pfioc_ruleset pr;
@@ -2249,54 +2249,82 @@ pfctl_walk_anchors(int dev, int opts, char *anchorname, 
void *warg,
 int
 pfctl_show_anchors(int dev, int opts, char *anchorname)
 {
-   int rv;
-
-   rv = pfctl_walk_anchors(dev, opts, anchorname, NULL, pfctl_walk_show);
-   return (rv);
+   return (
+   pfctl_walk_anchors(dev, opts, anchorname, NULL, pfctl_walk_show));
 }
 
 struct pfr_anchors *
-pfctl_get_anchors(int dev, int opts)
+pfctl_get_anchors(int dev, const char *anchorname, int opts)
 {
struct pfioc_rulesetpr;
- 

Re: pfctl - allow recursive flush operations [ 2/5 ]

2020-01-13 Thread Alexandr Nedvedicky
Hello,

change introduces a pfctl_recurse() function we use to implement
recursive flush operation. The pfctl_recurse() is built from
two pieces:
- pfctl_walk_get(), which itself is a new callback for pfctl_walk()
  we introduced in earlier patch. The pfctl_walk_get() puts every
  anchor found in tree to single linked list

- pfctl_get_anchors() uses pfctl_walk() to collect all anchors
  found in PF driver to list

- once there is a list of anchors pfctl_recurse() consumes
  the list of anchors applying a nuke() function. The nuke()
  function is an argument to pfctl_recurse(). The actual
  implementation is subject of follow up diff.

I've done some expirements with making pfctl_recurse() to pass
a nuke() callback directly to pfctl_walk() function, but the results
where 'disappointing'. We soon were tripping a panics on dangling
pointers. I think we need to revisit pf_ruleset.c before we will
be able to get rid of the extra list here. I'd like to stay pragmatic
and go ahead with list and makes things nice and tidy later.

thanks and
regards
sashan

8<---8<---8<--8<

diff --git a/sbin/pfctl/pfctl.c b/sbin/pfctl/pfctl.c
index ba3610a6a7b..2c5835df7c6 100644
--- a/sbin/pfctl/pfctl.c
+++ b/sbin/pfctl/pfctl.c
@@ -107,8 +107,12 @@ void   pfctl_state_store(int, const char *);
 void   pfctl_state_load(int, const char *);
 void   pfctl_reset(int, int);
 intpfctl_walk_show(int, struct pfioc_ruleset *, void *);
+intpfctl_walk_get(int, struct pfioc_ruleset *, void *);
 intpfctl_walk_anchors(int, int, char *, void *,
 int(*)(int, struct pfioc_ruleset *, void *));
+struct pfr_anchors *
+   pfctl_get_anchors(int, int);
+intpfctl_recurse(int, int, int(*nuke)(int, int, struct pfr_anchoritem *));
 
 const char *clearopt;
 char   *rulesopt;
@@ -2123,6 +2127,39 @@ pfctl_walk_show(int opts, struct pfioc_ruleset *pr, void 
*warg)
return (0);
 }
 
+int
+pfctl_walk_get(int opts, struct pfioc_ruleset *pr, void *warg)
+{
+   struct pfr_anchoritem *pfra;
+   unsigned int len;
+   struct {
+   struct pfr_anchoritem *pfra;
+   } *tail_pfra = warg;
+
+   pfra = malloc(sizeof(*pfra));
+   if (pfra == NULL)
+   err(1, "%s", __func__);
+
+   len = strlen(pr->path) + 1;
+   len += strlen(pr->name) + 1;
+   pfra->pfra_anchorname = malloc(len);
+   if (pfra->pfra_anchorname == NULL)
+   err(1, "%s", __func__);
+
+   if (pr->path[0])
+   snprintf(pfra->pfra_anchorname, len, "%s/%s",
+   pr->path, pr->name);
+   else
+   snprintf(pfra->pfra_anchorname, len, "%s", pr->name);
+
+   if (tail_pfra->pfra != NULL)
+   SLIST_INSERT_AFTER(tail_pfra->pfra, pfra, pfra_sle);
+
+   tail_pfra->pfra = pfra;
+
+   return (0);
+}
+
 int
 pfctl_walk_anchors(int dev, int opts, char *anchorname, void *warg,
 int(walkf)(int, struct pfioc_ruleset *, void *))
@@ -2131,7 +2168,7 @@ pfctl_walk_anchors(int dev, int opts, char *anchorname, 
void *warg,
u_int32_tmnr, nr;
 
memset(, 0, sizeof(pr));
-   memcpy(pr.path, anchorname, sizeof(pr.path));
+   strlcpy(pr.path, anchorname, sizeof(pr.path));
if (ioctl(dev, DIOCGETRULESETS, ) == -1) {
if (errno == EINVAL)
fprintf(stderr, "Anchor '%s' not found.\n",
@@ -2179,6 +2216,56 @@ pfctl_show_anchors(int dev, int opts, char *anchorname)
return (rv);
 }
 
+struct pfr_anchors *
+pfctl_get_anchors(int dev, int opts)
+{
+   struct pfioc_rulesetpr;
+   struct {
+   struct pfr_anchoritem *pfra;
+   } pfra;
+   static struct pfr_anchors anchors;
+
+   SLIST_INIT();
+
+   memset(, 0, sizeof(pr));
+   pfra.pfra = NULL;
+   pfctl_walk_get(opts, , );
+   if (pfra.pfra == NULL) {
+   fprintf(stderr, "%s failed to allocate main anchor\n",
+   __func__);
+   exit(1);
+   }
+   SLIST_INSERT_HEAD(, pfra.pfra, pfra_sle);
+
+   opts |= PF_OPT_VERBOSE;
+   if (pfctl_walk_anchors(dev, opts, "", , pfctl_walk_get)) {
+   fprintf(stderr,
+   "%s failed to retreive list of anchors, can't continue\n",
+   __func__);
+   exit(1);
+   }
+
+   return ();
+}
+
+int
+pfctl_recurse(int dev, int opts, int(*nuke)(int, int, struct pfr_anchoritem *))
+{
+   int  rv = 0;
+   struct pfr_anchors  *anchors;
+   struct pfr_anchoritem   *pfra, *pfra_save;
+
+   anchors = pfctl_get_anchors(dev, opts);
+   SLIST_FOREACH_SAFE(pfra, anchors, pfra_sle, pfra_save) {
+   rv |= nuke(dev, opts, pfra);
+   SLIST_REMOVE(anchors, pfra, pfr_anchoritem, pfra_sle);
+   free(pfra->pfra_anchorname);
+   free(pfra);
+   }
+
+   return 

Re: pfctl - allow recursive flush operations [ 3/5 ]

2020-01-13 Thread Alexandr Nedvedicky
Hello,

change below splits 'pfctl_show_anchors()' to two functions:
- pfctl_walk_anchors() function, which can traverse all anchor
  tree loaded to PF driver and 

- pfctl_walk_show(), which is a call back for pfctl_walk_anchors()
  it just prints anchor name

this change is a foundation for a recursive flush.

thanks and
regards
sashan

8<---8<---8<--8<

diff --git a/sbin/pfctl/pfctl.c b/sbin/pfctl/pfctl.c
index d1a309d919f..ba3610a6a7b 100644
--- a/sbin/pfctl/pfctl.c
+++ b/sbin/pfctl/pfctl.c
@@ -106,6 +106,9 @@ const char  *pfctl_lookup_option(char *, const char **);
 void   pfctl_state_store(int, const char *);
 void   pfctl_state_load(int, const char *);
 void   pfctl_reset(int, int);
+intpfctl_walk_show(int, struct pfioc_ruleset *, void *);
+intpfctl_walk_anchors(int, int, char *, void *,
+int(*)(int, struct pfioc_ruleset *, void *));
 
 const char *clearopt;
 char   *rulesopt;
@@ -2109,7 +2112,20 @@ pfctl_debug(int dev, u_int32_t level, int opts)
 }
 
 int
-pfctl_show_anchors(int dev, int opts, char *anchorname)
+pfctl_walk_show(int opts, struct pfioc_ruleset *pr, void *warg)
+{
+   if (pr->path[0]) {
+   if (pr->path[0] != '_' || (opts & PF_OPT_VERBOSE))
+   printf("  %s/%s\n", pr->path, pr->name);
+   } else if (pr->name[0] != '_' || (opts & PF_OPT_VERBOSE))
+   printf("  %s\n", pr->name);
+
+   return (0);
+}
+
+int
+pfctl_walk_anchors(int dev, int opts, char *anchorname, void *warg,
+int(walkf)(int, struct pfioc_ruleset *, void *))
 {
struct pfioc_ruleset pr;
u_int32_tmnr, nr;
@@ -2134,19 +2150,35 @@ pfctl_show_anchors(int dev, int opts, char *anchorname)
if (!strcmp(pr.name, PF_RESERVED_ANCHOR))
continue;
sub[0] = '\0';
-   if (pr.path[0]) {
-   strlcat(sub, pr.path, sizeof(sub));
-   strlcat(sub, "/", sizeof(sub));
-   }
-   strlcat(sub, pr.name, sizeof(sub));
-   if (sub[0] != '_' || (opts & PF_OPT_VERBOSE))
-   printf("  %s\n", sub);
-   if ((opts & PF_OPT_VERBOSE) && pfctl_show_anchors(dev, opts, 
sub))
+
+   if (walkf(opts, , warg))
return (-1);
+
+   if (opts & PF_OPT_VERBOSE) {
+   charsub[PATH_MAX];
+
+   if (pr.path[0])
+   snprintf(sub, sizeof(sub), "%s/%s",
+   pr.path, pr.name);
+   else
+   snprintf(sub, sizeof(sub), "%s",
+   pr.name);
+   if (pfctl_walk_anchors(dev, opts, sub, warg, walkf))
+   return (-1);
+   }
}
return (0);
 }
 
+int
+pfctl_show_anchors(int dev, int opts, char *anchorname)
+{
+   int rv;
+
+   rv = pfctl_walk_anchors(dev, opts, anchorname, NULL, pfctl_walk_show);
+   return (rv);
+}
+
 const char *
 pfctl_lookup_option(char *cmd, const char **list)
 {



Re: pfctl - allow recursive flush operations [ 4/5 ]

2020-01-13 Thread Alexandr Nedvedicky
Hello,

pfctl diff bellow enables me to relax err(3) and errx(3) to warnings
where needed. The idea is to introduce PF_OPT_IGNFAIL and pfctl_err()
and pfctl_errx() wrappers around err(3) and errx(3) functions.

the wrappers, when acting in warning mode (PF_OPT_IGNFAIL set), would
also set exit_val global variable so main() can report error.

thanks and
regards
sashan

8<---8<---8<--8<

diff --git a/sbin/pfctl/pfctl.c b/sbin/pfctl/pfctl.c
index d1a309d919f..cf91b435121 100644
--- a/sbin/pfctl/pfctl.c
+++ b/sbin/pfctl/pfctl.c
@@ -54,6 +54,8 @@
 
 #include 
 
+#include 
+
 #include "pfctl_parser.h"
 #include "pfctl.h"
 
@@ -125,6 +127,7 @@ char*state_kill[2];
 int dev = -1;
 int first_title = 1;
 int labels = 0;
+int exit_val = 0;
 
 #define INDENT(d, o)   do {\
if (o) {\
@@ -251,6 +254,40 @@ usage(void)
exit(1);
 }
 
+void
+pfctl_err(int opts, int eval, const char *fmt, ...)
+{
+   va_list ap;
+
+   va_start(ap, fmt);
+
+   if ((opts & PF_OPT_IGNFAIL) == 0)
+   verr(eval, fmt, ap);
+   else
+   vwarn(fmt, ap);
+
+   va_end(ap);
+
+   exit_val = eval;
+}
+
+void
+pfctl_errx(int opts, int eval, const char *fmt, ...)
+{
+   va_list ap;
+
+   va_start(ap, fmt);
+
+   if ((opts & PF_OPT_IGNFAIL) == 0)
+   verrx(eval, fmt, ap);
+   else
+   vwarnx(fmt, ap);
+
+   va_end(ap);
+
+   exit_val = eval;
+}
+
 int
 pfctl_enable(int dev, int opts)
 {
@@ -289,10 +326,10 @@ pfctl_clear_stats(int dev, const char *iface, int opts)
memset(, 0, sizeof(pi));
if (iface != NULL && strlcpy(pi.pfiio_name, iface,
sizeof(pi.pfiio_name)) >= sizeof(pi.pfiio_name))
-   errx(1, "invalid interface: %s", iface);
+   pfctl_errx(opts, 1, "invalid interface: %s", iface);
 
if (ioctl(dev, DIOCCLRSTATUS, ) == -1)
-   err(1, "DIOCCLRSTATUS");
+   pfctl_err(opts, 1, "DIOCCLRSTATUS");
if ((opts & PF_OPT_QUIET) == 0) {
fprintf(stderr, "pf: statistics cleared");
if (iface != NULL)
@@ -311,32 +348,36 @@ pfctl_clear_interface_flags(int dev, int opts)
pi.pfiio_flags = PFI_IFLAG_SKIP;
 
if (ioctl(dev, DIOCCLRIFFLAG, ) == -1)
-   err(1, "DIOCCLRIFFLAG");
+   pfctl_err(opts, 1, "DIOCCLRIFFLAG");
if ((opts & PF_OPT_QUIET) == 0)
fprintf(stderr, "pf: interface flags reset\n");
}
 }
 
-void
+int
 pfctl_clear_rules(int dev, int opts, char *anchorname)
 {
struct pfr_buffer t;
+   int rv = 0;
 
memset(, 0, sizeof(t));
t.pfrb_type = PFRB_TRANS;
if (pfctl_add_trans(, PF_TRANS_RULESET, anchorname) ||
pfctl_trans(dev, , DIOCXBEGIN, 0) ||
-   pfctl_trans(dev, , DIOCXCOMMIT, 0))
-   err(1, "pfctl_clear_rules");
-   if ((opts & PF_OPT_QUIET) == 0)
+   pfctl_trans(dev, , DIOCXCOMMIT, 0)) {
+   pfctl_err(opts, 1, "%s", __func__);
+   rv = 1;
+   } else if ((opts & PF_OPT_QUIET) == 0)
fprintf(stderr, "rules cleared\n");
+
+   return (rv);
 }
 
 void
 pfctl_clear_src_nodes(int dev, int opts)
 {
if (ioctl(dev, DIOCCLRSRCNODES) == -1)
-   err(1, "DIOCCLRSRCNODES");
+   pfctl_err(opts, 1, "DIOCCLRSRCNODES");
if ((opts & PF_OPT_QUIET) == 0)
fprintf(stderr, "source tracking entries cleared\n");
 }
@@ -349,10 +390,10 @@ pfctl_clear_states(int dev, const char *iface, int opts)
memset(, 0, sizeof(psk));
if (iface != NULL && strlcpy(psk.psk_ifname, iface,
sizeof(psk.psk_ifname)) >= sizeof(psk.psk_ifname))
-   errx(1, "invalid interface: %s", iface);
+   pfctl_errx(opts, 1, "invalid interface: %s", iface);
 
if (ioctl(dev, DIOCCLRSTATES, ) == -1)
-   err(1, "DIOCCLRSTATES");
+   pfctl_err(opts, 1, "DIOCCLRSTATES");
if ((opts & PF_OPT_QUIET) == 0)
fprintf(stderr, "%d states cleared\n", psk.psk_killed);
 }
diff --git a/sbin/pfctl/pfctl.h b/sbin/pfctl/pfctl.h
index 7981cf66fdb..669ffcb3784 100644
--- a/sbin/pfctl/pfctl.h
+++ b/sbin/pfctl/pfctl.h
@@ -96,5 +96,7 @@ u_int32_t
 int pfctl_trans(int, struct pfr_buffer *, u_long, int);
 
 int pfctl_show_queues(int, const char *, int, int);
+voidpfctl_err(int, int, const char *, ...);
+voidpfctl_errx(int, int, const char *, ...);
 
 #endif /* _PFCTL_H_ */
diff --git a/sbin/pfctl/pfctl_osfp.c b/sbin/pfctl/pfctl_osfp.c
index 79abfd1a7ab..8c0ea24171b 100644
--- a/sbin/pfctl/pfctl_osfp.c
+++ b/sbin/pfctl/pfctl_osfp.c
@@ -260,7 +260,7 @@ void
 

pfctl - allow recursive flush operations [ 5/5 ]

2020-01-13 Thread Alexandr Nedvedicky
Hello,

below is a complete change I'd like to commit. Diff below enables pfctl to
recursively flush objects (tables, rules, anchors) from PF.  The same change
has been discussed last spring [1]. But the discussion died and all effort
dropped down on the floor. I'd like to restart the effort now.

The idea is to enable "pfctl -a '*' -F[art]" flush all objects recursively.
Change below does not update manpage yet. I'd like to fine tune manpage changes
in follow up commit. The idea is to make current implementation of "pfctl -a
'*' -sr" more generic. My change turns that to generic function, which can walk
tree of anchors and apply desired operation (show/flush) on every node found.

I'm going to change partial diffs just for reference, to make review bit
easier.  However I'd like to commit a big diff in one go, because partial
diffs were never tested.

Also kn@ was testing that change back in 2019 and found some glitches [2].
IMO the issues pointed out by kn@ are present in code already, diff below
just uncovers them. We can investigate/fix them once the change below will
be in.

I'll send partial diffs as a reply to this email to keep things organized.

thanks and
regards
sashan

[1] https://marc.info/?l=openbsd-tech=155545759616085=2

[2] https://marc.info/?l=openbsd-tech=155769978704563=2

8<---8<---8<--8<
diff --git a/sbin/pfctl/pfctl.c b/sbin/pfctl/pfctl.c
index d1a309d919f..4a6bdc588a4 100644
--- a/sbin/pfctl/pfctl.c
+++ b/sbin/pfctl/pfctl.c
@@ -51,8 +51,9 @@
 #include 
 #include 
 #include 
-
 #include 
+#include 
+#include 
 
 #include "pfctl_parser.h"
 #include "pfctl.h"
@@ -63,7 +64,7 @@ intpfctl_disable(int, int);
 voidpfctl_clear_queues(struct pf_qihead *);
 voidpfctl_clear_stats(int, const char *, int);
 voidpfctl_clear_interface_flags(int, int);
-voidpfctl_clear_rules(int, int, char *);
+int pfctl_clear_rules(int, int, char *);
 voidpfctl_clear_src_nodes(int, int);
 voidpfctl_clear_states(int, const char *, int);
 struct addrinfo *
@@ -106,6 +107,17 @@ const char *pfctl_lookup_option(char *, const char **);
 void   pfctl_state_store(int, const char *);
 void   pfctl_state_load(int, const char *);
 void   pfctl_reset(int, int);
+intpfctl_walk_show(int, struct pfioc_ruleset *, void *);
+intpfctl_walk_get(int, struct pfioc_ruleset *, void *);
+intpfctl_walk_anchors(int, int, const char *, void *,
+int(*)(int, struct pfioc_ruleset *, void *));
+struct pfr_anchors *
+   pfctl_get_anchors(int, const char *, int);
+intpfctl_recurse(int, int, const char *,
+   int(*)(int, int, struct pfr_anchoritem *));
+intpfctl_call_clearrules(int, int, struct pfr_anchoritem *);
+intpfctl_call_cleartables(int, int, struct pfr_anchoritem *);
+intpfctl_call_clearanchors(int, int, struct pfr_anchoritem *);
 
 const char *clearopt;
 char   *rulesopt;
@@ -125,6 +137,7 @@ char*state_kill[2];
 int dev = -1;
 int first_title = 1;
 int labels = 0;
+int exit_val = 0;
 
 #define INDENT(d, o)   do {\
if (o) {\
@@ -234,7 +247,6 @@ static const char *optiopt_list[] = {
 struct pf_qihead qspecs = TAILQ_HEAD_INITIALIZER(qspecs);
 struct pf_qihead rootqs = TAILQ_HEAD_INITIALIZER(rootqs);
 
-
 __dead void
 usage(void)
 {
@@ -251,6 +263,40 @@ usage(void)
exit(1);
 }
 
+void
+pfctl_err(int opts, int eval, const char *fmt, ...)
+{
+   va_list ap;
+
+   va_start(ap, fmt);
+
+   if ((opts & PF_OPT_IGNFAIL) == 0)
+   verr(eval, fmt, ap);
+   else
+   vwarn(fmt, ap);
+
+   va_end(ap);
+
+   exit_val = eval;
+}
+
+void
+pfctl_errx(int opts, int eval, const char *fmt, ...)
+{
+   va_list ap;
+
+   va_start(ap, fmt);
+
+   if ((opts & PF_OPT_IGNFAIL) == 0)
+   verrx(eval, fmt, ap);
+   else
+   vwarnx(fmt, ap);
+
+   va_end(ap);
+
+   exit_val = eval;
+}
+
 int
 pfctl_enable(int dev, int opts)
 {
@@ -289,10 +335,10 @@ pfctl_clear_stats(int dev, const char *iface, int opts)
memset(, 0, sizeof(pi));
if (iface != NULL && strlcpy(pi.pfiio_name, iface,
sizeof(pi.pfiio_name)) >= sizeof(pi.pfiio_name))
-   errx(1, "invalid interface: %s", iface);
+   pfctl_errx(opts, 1, "invalid interface: %s", iface);
 
if (ioctl(dev, DIOCCLRSTATUS, ) == -1)
-   err(1, "DIOCCLRSTATUS");
+   pfctl_err(opts, 1, "DIOCCLRSTATUS");
if ((opts & PF_OPT_QUIET) == 0) {
fprintf(stderr, "pf: statistics cleared");
if (iface != NULL)
@@ -311,32 +357,35 @@ pfctl_clear_interface_flags(int dev, int opts)
pi.pfiio_flags = PFI_IFLAG_SKIP;
 
if (ioctl(dev, DIOCCLRIFFLAG, ) == -1)
-   err(1, 

Re: remove special ::1 ours check

2019-12-30 Thread Alexandr Nedvedicky
Hello,

On Tue, Dec 24, 2019 at 03:27:44PM +0100, Alexander Bluhm wrote:
> Hi,
> 
> The loopback check in ip6_input_if() seems needless.  The ::1
> destination address is in the routing table and will be identified
> as any other local address.  Better use the generic IP input path.
> I see no reason to handle ::1 source address special.  We do not
> have this check for IPv4.  Kame has removed it in this commit.
> 
> revision 1.189

> 
> ok?
> 

OK sashan



Re: attention please: host's IP stack behavior got changed slightly

2019-12-22 Thread Alexandr Nedvedicky
Hello,


On Fri, Dec 20, 2019 at 11:12:15PM +0100, Alexander Bluhm wrote:
> On Wed, Dec 18, 2019 at 09:07:35AM +0100, Alexandr Nedvedicky wrote:
> > I see. Updated diff below makes ip6_input_if() to explicitly check
> > for PF_TAG_TRANSLATE_LOCALHOST tag, when ip6_forwarding is disabled.
> >
> > if ip6_forwarding is enabled, then the ip6_input_if() keeps current
> > behavior.
> 
> You have misunderstood my internsion.
> 

Yes, I obviously did. Updated diff is below.



> 
> And the second question, but not for this commit, is why do we
> need this block?
> 
> if (IN6_IS_ADDR_LOOPBACK(>ip6_src) ||
> IN6_IS_ADDR_LOOPBACK(>ip6_dst)) {
> nxt = ip6_ours(mp, offp, nxt, af);
> goto out;
> }
> 
> It was removed in kame here:
> 


It did not come to my mind to check kame project, when seeing
code above. It looks like it has evolved to current shape
from original code below (as committed by itojun@)

+   /*
+* Check with the firewall...
+*/
+   if (ip6_fw_chk_ptr) {
+   u_short port = 0;
+   /* If ipfw says divert, we have to just drop packet */
+   /* use port as a dummy argument */
+   if ((*ip6_fw_chk_ptr)(, NULL, , )) {
+   m_freem(m);
+   m = NULL;
+   }
+   if (!m)
+   return;
+   }
+#endif
+
+   /*
+* Scope check
+*/
+   if (IN6_IS_ADDR_MULTICAST(>ip6_src) ||
+   IN6_IS_ADDR_UNSPECIFIED(>ip6_dst)) {
+   ip6stat.ip6s_badscope++;
+   in6_ifstat_inc(m->m_pkthdr.rcvif, ifs6_in_addrerr);
+   goto bad;
+   }
+   if (IN6_IS_ADDR_LOOPBACK(>ip6_src) ||
+   IN6_IS_ADDR_LOOPBACK(>ip6_dst)) {
+   if (m->m_pkthdr.rcvif->if_flags & IFF_LOOPBACK) {
+   ours = 1;
+   deliverifp = m->m_pkthdr.rcvif;
+   goto hbhcheck;
+   } else {
+   ip6stat.ip6s_badscope++;
+   in6_ifstat_inc(m->m_pkthdr.rcvif, ifs6_in_addrerr);
+   goto bad;
+   }
+   }

It seems to me we should just remove them. You have my OK if you want
to do it. Perhaps intention in earlier change to ip6_input_if() was to
move IN6_IS_ADDR_LOOPBACK() checks before ip6_input_if() calls pf_test()
instead of doing copy'n'modified-paste.

thanks and
regards
sashan

8<---8<---8<--8<
diff --git a/sys/netinet/ip_input.c b/sys/netinet/ip_input.c
index 058b2f038fa..f4114f45045 100644
--- a/sys/netinet/ip_input.c
+++ b/sys/netinet/ip_input.c
@@ -753,7 +753,8 @@ in_ouraddr(struct mbuf *m, struct ifnet *ifp, struct 
rtentry **prt)
}
}
} else if (ipforwarding == 0 && rt->rt_ifidx != ifp->if_index &&
-   !((ifp->if_flags & IFF_LOOPBACK) || (ifp->if_type == IFT_ENC))) {
+   !((ifp->if_flags & IFF_LOOPBACK) || (ifp->if_type == IFT_ENC) ||
+   (m->m_pkthdr.pf.flags & PF_TAG_TRANSLATE_LOCALHOST))) {
/* received on wrong interface. */
 #if NCARP > 0
struct ifnet *out_if;
diff --git a/sys/netinet6/ip6_input.c b/sys/netinet6/ip6_input.c
index 5404d7ccfb4..62e92d9c46c 100644
--- a/sys/netinet6/ip6_input.c
+++ b/sys/netinet6/ip6_input.c
@@ -435,7 +435,8 @@ ip6_input_if(struct mbuf **mp, int *offp, int nxt, int af, 
struct ifnet *ifp)
 
if (ip6_forwarding == 0 && rt->rt_ifidx != ifp->if_index &&
!((ifp->if_flags & IFF_LOOPBACK) ||
-   (ifp->if_type == IFT_ENC))) {
+   (ifp->if_type == IFT_ENC)) ||
+   (m->m_pkthdr.pf.flags & PF_TAG_TRANSLATE_LOCALHOST)) {
/* received on wrong interface */
 #if NCARP > 0
struct ifnet *out_if;




Re: attention please: host's IP stack behavior got changed slightly

2019-12-18 Thread Alexandr Nedvedicky
Hello,


On Wed, Dec 18, 2019 at 12:24:57AM +0100, Alexander Bluhm wrote:
> On Mon, Dec 16, 2019 at 03:42:27PM +0100, Alexandr Nedvedicky wrote:
> > > I think this is a "do as I want" kind of thing. If I use pf(4) to redirect
> > > traffic to a different address then I think our version of strict host
> > > model should step back and accept the connection.
> >
> > and also the change makes IPv4 behavior consistent with IPv6.
> > so if we won't be committing diff for IPv4, then we should change IPv6
> > to enforce divert-to for IPv6 too.
> 
> IPv4 and IPv6 code looks different.  In ip6_input_if() the
> IN6_IS_ADDR_LOOPBACK() check accepts packets redirected to ::1.  Do
> we really need that?  We always have ::1 on lo0 and a valid route.
> And why should a source ::1 enforce local delivery?  That looks
> odd.
> 
> I would prefer to have the PF_TAG_TRANSLATE_LOCALHOST check in both
> ip_input_if() and ip6_input_if() to explicitly make clear that
> redirect does not follow the strict host model.
> 

I see. Updated diff below makes ip6_input_if() to explicitly check
for PF_TAG_TRANSLATE_LOCALHOST tag, when ip6_forwarding is disabled.

if ip6_forwarding is enabled, then the ip6_input_if() keeps current
behavior.

thanks and
regards
sashan

8<---8<---8<--8<
diff --git a/sys/netinet/ip_input.c b/sys/netinet/ip_input.c
index 058b2f038fa..f4114f45045 100644
--- a/sys/netinet/ip_input.c
+++ b/sys/netinet/ip_input.c
@@ -753,7 +753,8 @@ in_ouraddr(struct mbuf *m, struct ifnet *ifp, struct 
rtentry **prt)
}
}
} else if (ipforwarding == 0 && rt->rt_ifidx != ifp->if_index &&
-   !((ifp->if_flags & IFF_LOOPBACK) || (ifp->if_type == IFT_ENC))) {
+   !((ifp->if_flags & IFF_LOOPBACK) || (ifp->if_type == IFT_ENC) ||
+   (m->m_pkthdr.pf.flags & PF_TAG_TRANSLATE_LOCALHOST))) {
/* received on wrong interface. */
 #if NCARP > 0
struct ifnet *out_if;
diff --git a/sys/netinet6/ip6_input.c b/sys/netinet6/ip6_input.c
index 5404d7ccfb4..919f8ae8f03 100644
--- a/sys/netinet6/ip6_input.c
+++ b/sys/netinet6/ip6_input.c
@@ -335,8 +335,11 @@ ip6_input_if(struct mbuf **mp, int *offp, int nxt, int af, 
struct ifnet *ifp)
goto bad;
}
 
-   if (IN6_IS_ADDR_LOOPBACK(>ip6_src) ||
-   IN6_IS_ADDR_LOOPBACK(>ip6_dst)) {
+   if (((ip6_forwarding != 0) && ((IN6_IS_ADDR_LOOPBACK(>ip6_src) ||
+   IN6_IS_ADDR_LOOPBACK(>ip6_dst ||
+   ((ip6_forwarding == 0) &&
+   ((m->m_pkthdr.pf.flags & PF_TAG_TRANSLATE_LOCALHOST) &&
+   IN6_IS_ADDR_LOOPBACK(>ip6_dst {
nxt = ip6_ours(mp, offp, nxt, af);
goto out;
}



Re: attention please: host's IP stack behavior got changed slightly

2019-12-16 Thread Alexandr Nedvedicky
Hello,

On Mon, Dec 16, 2019 at 03:21:43PM +0100, Claudio Jeker wrote:
> On Mon, Dec 16, 2019 at 02:13:50PM +0100, Alexander Bluhm wrote:
> > On Sun, Dec 15, 2019 at 03:17:26PM +0100, Alexandr Nedvedicky wrote:
> > > Hello Daniel,
> > >
> > > thanks for reporting back.
> > >
> > > 
> > > > Should the rdr-to rule still work? I fixed it with using the "Port foo"
> > > > directive in my sshd config (and a simple "pass in to port foo") in the
> > > > meantime.
> > >
> > > My earlier indeed change omits your usecase. The rdr rule should still
> > > work. Patch below should fix it. The idea is to check whether the
> > > packet got NATed to loopback. We let packet in, if it got changed
> > > by PF.
> > >
> > > The IPv6 part does not need similar fix. According to quick check
> > > of existing code it works.
> > >
> > > OK ?
> > 
> > Redirect to localhost is a violation of the strict host model.
> > Why not encourage people to use divert-to for local delivery?
> 
> I think this is a "do as I want" kind of thing. If I use pf(4) to redirect
> traffic to a different address then I think our version of strict host
> model should step back and accept the connection.
>  

and also the change makes IPv4 behavior consistent with IPv6.
so if we won't be committing diff for IPv4, then we should change IPv6
to enforce divert-to for IPv6 too.

thanks and
regards
sashan



Re: attention please: host's IP stack behavior got changed slightly

2019-12-15 Thread Alexandr Nedvedicky
Hello Daniel,

thanks for reporting back.


> Should the rdr-to rule still work? I fixed it with using the "Port foo"
> directive in my sshd config (and a simple "pass in to port foo") in the
> meantime.

My earlier indeed change omits your usecase. The rdr rule should still
work. Patch below should fix it. The idea is to check whether the
packet got NATed to loopback. We let packet in, if it got changed
by PF.

The IPv6 part does not need similar fix. According to quick check
of existing code it works.

OK ?

thanks and
regards
sashan

8<---8<---8<--8<
diff --git a/sys/netinet/ip_input.c b/sys/netinet/ip_input.c
index 058b2f038fa..f4114f45045 100644
--- a/sys/netinet/ip_input.c
+++ b/sys/netinet/ip_input.c
@@ -753,7 +753,8 @@ in_ouraddr(struct mbuf *m, struct ifnet *ifp, struct 
rtentry **prt)
}
}
} else if (ipforwarding == 0 && rt->rt_ifidx != ifp->if_index &&
-   !((ifp->if_flags & IFF_LOOPBACK) || (ifp->if_type == IFT_ENC))) {
+   !((ifp->if_flags & IFF_LOOPBACK) || (ifp->if_type == IFT_ENC) ||
+   (m->m_pkthdr.pf.flags & PF_TAG_TRANSLATE_LOCALHOST))) {
/* received on wrong interface. */
 #if NCARP > 0
struct ifnet *out_if;



Re: pfctl: Do not optimize empty rulesets

2019-12-13 Thread Alexandr Nedvedicky
Hello,

sorry it dropped on the floor of my INBOX...
thanks for reminding.


> Yes, the main anchor prints as "" but all that is behind compile time
> -DOPT_DEBUG so regular users won't deal with it anyway, so keep the code
> simple instead of adding logging around `rs->anchor->path'.
> 
> OK?

change looks good. OK sashan@

> 
> 
> Index: pfctl_optimize.c
> ===
> RCS file: /cvs/src/sbin/pfctl/pfctl_optimize.c,v
> retrieving revision 1.42
> diff -u -p -r1.42 pfctl_optimize.c
> --- pfctl_optimize.c  28 Jun 2019 13:32:45 -  1.42
> +++ pfctl_optimize.c  12 Dec 2019 20:06:15 -
> @@ -270,7 +270,10 @@ pfctl_optimize_ruleset(struct pfctl *pf,
>   struct pf_rule *r;
>   struct pf_rulequeue *old_rules;
>  
> - DEBUG("optimizing ruleset");
> + if (TAILQ_EMPTY(rs->rules.active.ptr))
> + return (0);
> +
> + DEBUG("optimizing ruleset \"%s\"", rs->anchor->path);
>   memset(_buffer, 0, sizeof(table_buffer));
>   skip_init();
>   TAILQ_INIT(_queue);
> 



  1   2   3   4   >