Re: huge pfsync rewrite

2023-07-03 Thread Alexandr Nedvedicky
Hello,

I went through the recent diff one more time. I could not spot
anything wrong. Also my home router was happy with it for
quite some time. Unfortunately I'm not using pfsync.
However I'm sure hrvoje@ done his best to try to make it
to crash and no luck diff survived.

Having said earlier it would be more risky if dlg@ will slice
this chunk to smaller diffs it is the best to commit the
whole change.


OK sashan



Re: huge pfsync rewrite

2023-07-02 Thread David Gwynne
On Sun, Jul 02, 2023 at 12:44:17PM +0200, Alexandr Nedvedicky wrote:
> Hello,
> 
> On Thu, Jun 29, 2023 at 01:48:27PM +1000, David Gwynne wrote:
> > On Mon, Jun 26, 2023 at 01:16:40AM +0200, Alexandr Nedvedicky wrote:
> > 
> > > net/if_pfsync.c
> > >   the diff currently uses two slices (PFSYNC_NSLICES). is there a plan to
> > >   scale it up?  the slice can be simply viewed as a kind of task. IMO the
> > >   number of slices can be aligned with number of cpu cores. Or is this
> > >   too simplified? I'm just trying to get some hints on how to further
> > >   tune performance.
> > 
> > that's part of a bigger discussion which involves how far we should
> > scale the number of nettqs and how parallel pf can go.
> > 
> > 2 slices demonstrates that pfsync can partition work and is safe doing
> > so. there kstats ive added on those slices show there isnt a lot of
> > contention in pfsync. yet.
> > 
> 
> I was just wondering, because if I remember correct hrvoje@ has noticed
> small performance degradation (compared with current). I think his test 
> was
> using 4 net tasks to forward packets through firewall.  now if there are
> just 2 tasks for pfsync, then this might be a way how the degradation
> sneaked in. just a thought.

if i remember correctly that result was when i was using the high bits
in the toeplitz hash on pf states to pick a pfsync slice. since i
changed it to use the same bits as the hardware/stack/pf his numbers
showed that old and new pfsync perform pretty much the same.

im running 8 nettqs with 2 pfsync slices in production, and the
pfsync slice mutexes were contended about 1.3% of the time on average
over the last 7 days. i havent tried scaling the number of slices up to
see what effect that has yet.



Re: huge pfsync rewrite

2023-07-02 Thread Alexandr Nedvedicky
Hello,

On Thu, Jun 29, 2023 at 01:48:27PM +1000, David Gwynne wrote:
> On Mon, Jun 26, 2023 at 01:16:40AM +0200, Alexandr Nedvedicky wrote:
> 
> > net/if_pfsync.c
> > the diff currently uses two slices (PFSYNC_NSLICES). is there a plan to
> > scale it up?  the slice can be simply viewed as a kind of task. IMO the
> > number of slices can be aligned with number of cpu cores. Or is this
> > too simplified? I'm just trying to get some hints on how to further
> > tune performance.
> 
> that's part of a bigger discussion which involves how far we should
> scale the number of nettqs and how parallel pf can go.
> 
> 2 slices demonstrates that pfsync can partition work and is safe doing
> so. there kstats ive added on those slices show there isnt a lot of
> contention in pfsync. yet.
> 

I was just wondering, because if I remember correct hrvoje@ has noticed
small performance degradation (compared with current). I think his test was
using 4 net tasks to forward packets through firewall.  now if there are
just 2 tasks for pfsync, then this might be a way how the degradation
sneaked in. just a thought.

thanks and
regards
sashan



Re: huge pfsync rewrite

2023-06-28 Thread David Gwynne
On Mon, Jun 26, 2023 at 01:16:40AM +0200, Alexandr Nedvedicky wrote:
> Hello,
> 
> The new code looks much better. It's huge leap forward.  I don't mind if this
> big diff will get committed. Hrvoje did test the whole diff. Trying to split 
> it
> to smaller changes might bring in code which is not tested enough.  We don't
> know how individual pieces work independently.
> 
> I've got some questions/comments which are perhaps worth to sort out
> before commit.
> 
> the diff touches those files:
> 
> net/if.c
>   - no objections/comments
> 
> net/ifq.c
>   - no objections/comments
> 
> net/netisr.h
>   - no objections/comments
> 
> net/pf_norm.c
>   - no objections/comments
> 
> net/pf_ioctl.c
>   - no objections/comments
> 
> net/pfvar.h
>   - no objections/comments
> 
> net/pfvar_priv.h
>   - no objections/comments
> 
> netinet/ip_ipsp.h
>   - no objections/comments
> 
> netinet/in_proto.c
>   - just wondering why we rename/introducing pfsync_input4().
> is there a plan to make pfsync(4) to work over IPv6?
> 
> net/pf.c
>   state expiration is overhauled, new code looks better.
>   once new code will settle down we might need to revisit 
>   'set timeout interval' description in pf.conf(5),
>   because if I understand the new code right, the state
>   expiration timer runs every second now. We might also
>   consider to expose `pf_purge_states_collect` as a new
>   tuning knob to control state expiration. Anyway this
>   can be always done as a follow up change.

the current code (pf_purge()) runs state expiration every second. we
could tweak the doco now.

>   - function  pf_state_export() line 1319 comment XXX
>   1319 if (READ_ONCE(st->sync_defer) != NULL) /* XXX */
>   1320 sp->state_flags |= htons(PFSTATE_ACK);
> what does it stand for?

if there's a pfsync_defer struct on the pf_state, then there's an mbuf
waiting for the peer to ack the state insertion. we need to add
PFSTATE_ACK in the pfsync message sent to the peer so it knows to reply
with an IACK.

the XXX can be removed. i can't see a hazard/panic with how that code
currently works. we're testing if the pointer is not NULL rather than
trying to dereference it.

>   - function pf_purge_expired_states(): handling of static
> variable `cur` is unsafe w.r.t. pf_state_list. However
> the same issue is present in current. I think there is
> a diff floating around which makes cur member of
> pf_state_list structure.

only pf_purge_expired_states removes states from the list, so it
having cur pointing into it is safe. it would be a lot neater if it was
in the pf_state_list structure though.

> net/if_pfsync.h
>   new if_pfsycnc changes codes for sync_state. This makes me wonder
>   if we should also update PFSYNC_VERSION. Just to avoid some
>   undesired/undefined behavior if only one firewall node in cluster gets
>   upgraded.

the wire protocol remains compatible, and it's been tested to be compatible.
this is important planned upgrades to firewall pairs. i've been beating
my head on this diff for so long because i've been wanting to upgrade
our big pair of firewalls at work, and we've been stuck on openbsd 6.9
until now.

> net/if_pfsync.c
>   the diff currently uses two slices (PFSYNC_NSLICES). is there a plan to
>   scale it up?  the slice can be simply viewed as a kind of task. IMO the
>   number of slices can be aligned with number of cpu cores. Or is this
>   too simplified? I'm just trying to get some hints on how to further
>   tune performance.

that's part of a bigger discussion which involves how far we should
scale the number of nettqs and how parallel pf can go.

2 slices demonstrates that pfsync can partition work and is safe doing
so. there kstats ive added on those slices show there isnt a lot of
contention in pfsync. yet.

>   I noticed there are few dances with NET_LOCK() (UNLOCK()/LOCK())
>   in up() and down() operations. I feel this comes from fact we
>   use NET_LOCK() to also protect state of pfsync(4) itself. I wounder
>   if we can make our life easier here with dedicated rw-lock similar
>   to pfioctl_rw we have in pf(4). The dedicated rw-lock would make
>   sure there is no other ioctl() operation intervening with us.
>   this can be certainly left for follow up change.

pfsync doesn't really use NET_LOCK itself anymore, but the stack
requires it to serialise changes to struct ifnet if_flags, and it's
already held when most of the ioctls are called. it's also supposed to
be held when you call ip_output, which is where an annoying amount of
complexity comes from tbh.

i did try a dedicated rwlock to coordinate between pfsync_up and
pfsync_down, but it introduced a 3 way deadlock. the sc_up flag which
pfsync_up returns early on is enough to break that 

Re: huge pfsync rewrite

2023-06-25 Thread Alexandr Nedvedicky
Hello,

The new code looks much better. It's huge leap forward.  I don't mind if this
big diff will get committed. Hrvoje did test the whole diff. Trying to split it
to smaller changes might bring in code which is not tested enough.  We don't
know how individual pieces work independently.

I've got some questions/comments which are perhaps worth to sort out
before commit.

the diff touches those files:

net/if.c
- no objections/comments

net/ifq.c
- no objections/comments

net/netisr.h
- no objections/comments

net/pf_norm.c
- no objections/comments

net/pf_ioctl.c
- no objections/comments

net/pfvar.h
- no objections/comments

net/pfvar_priv.h
- no objections/comments

netinet/ip_ipsp.h
- no objections/comments

netinet/in_proto.c
- just wondering why we rename/introducing pfsync_input4().
  is there a plan to make pfsync(4) to work over IPv6?

net/pf.c
state expiration is overhauled, new code looks better.
once new code will settle down we might need to revisit 
'set timeout interval' description in pf.conf(5),
because if I understand the new code right, the state
expiration timer runs every second now. We might also
consider to expose `pf_purge_states_collect` as a new
tuning knob to control state expiration. Anyway this
can be always done as a follow up change.

- function  pf_state_export() line 1319 comment XXX
1319 if (READ_ONCE(st->sync_defer) != NULL) /* XXX */
1320 sp->state_flags |= htons(PFSTATE_ACK);
  what does it stand for?

- function pf_purge_expired_states(): handling of static
  variable `cur` is unsafe w.r.t. pf_state_list. However
  the same issue is present in current. I think there is
  a diff floating around which makes cur member of
  pf_state_list structure.

net/if_pfsync.h
new if_pfsycnc changes codes for sync_state. This makes me wonder
if we should also update PFSYNC_VERSION. Just to avoid some
undesired/undefined behavior if only one firewall node in cluster gets
upgraded.

net/if_pfsync.c
the diff currently uses two slices (PFSYNC_NSLICES). is there a plan to
scale it up?  the slice can be simply viewed as a kind of task. IMO the
number of slices can be aligned with number of cpu cores. Or is this
too simplified? I'm just trying to get some hints on how to further
tune performance.

I noticed there are few dances with NET_LOCK() (UNLOCK()/LOCK())
in up() and down() operations. I feel this comes from fact we
use NET_LOCK() to also protect state of pfsync(4) itself. I wounder
if we can make our life easier here with dedicated rw-lock similar
to pfioctl_rw we have in pf(4). The dedicated rw-lock would make
sure there is no other ioctl() operation intervening with us.
this can be certainly left for follow up change.

- I think lines 102-108 and 110 can be removed:
 102 #if 0
 103 #define DPRINTF(_fmt...) do {   \
 104 printf("%s[%u]: ", __func__, __LINE__); \
 105 printf(_fmt);   \
 106 printf("\n");   \
 107 } while (0)
 108 #else

- function pfsync_clone_create(), line 384:
 381 pool_init(_deferrals_pool,
 382 sizeof(struct pfsync_deferral), 0,
 383 IPL_MPFLOOR, 0, "pfdefer", NULL);
 384 /* pool_cache_init(_deferrals_pool); */
  does it mean pool_cache_init() is not ready/well tested yet?

- function pfsync_clone_create(), comments /* grumble ... */
  on lines 401, and 405, what is that? Are those related
  to line 440: /* stupid NET_LOCK */ is it about using time out
  to deal with NET_LOCK()? like avoid recursion on NET_LOCK()?

- function pfsync_sendout() line 1539, comment /* XXX */
  what does it stand for?

- function pfsync_clear_states() is it complete? it seems
  to me it has no effect (apart from exercise with reference)

I have not spot anything else.

thanks and
regards
sashan