Re: Use SMR instead of SRP list in rtsock.c

2022-06-30 Thread Vitaliy Makkoveev
On Thu, Jun 30, 2022 at 02:32:03PM +0200, Claudio Jeker wrote: > On Thu, Jun 30, 2022 at 03:21:40PM +0300, Vitaliy Makkoveev wrote: > > On Thu, Jun 30, 2022 at 11:56:55AM +0200, Claudio Jeker wrote: > > > On Thu, Jun 30, 2022 at 12:34:33PM +0300, Vitaliy Makkoveev wrote: >

Re: Use SMR instead of SRP list in rtsock.c

2022-06-30 Thread Vitaliy Makkoveev
On Thu, Jun 30, 2022 at 11:51:52AM +0200, Claudio Jeker wrote: > On Thu, Jun 30, 2022 at 11:08:48AM +0200, Claudio Jeker wrote: > > This diff converts the SRP list to a SMR list in rtsock.c > > SRP is a bit strange with how it works and the SMR code is a bit easier to > > understand. Since we can

Re: Use SMR instead of SRP list in rtsock.c

2022-06-30 Thread Vitaliy Makkoveev
On Thu, Jun 30, 2022 at 11:56:55AM +0200, Claudio Jeker wrote: > On Thu, Jun 30, 2022 at 12:34:33PM +0300, Vitaliy Makkoveev wrote: > > On Thu, Jun 30, 2022 at 11:08:48AM +0200, Claudio Jeker wrote: > > > This diff converts the SRP list to a SMR list in rtsock.c > > > SR

Re: Use SMR instead of SRP list in rtsock.c

2022-06-30 Thread Vitaliy Makkoveev
On Thu, Jun 30, 2022 at 11:08:48AM +0200, Claudio Jeker wrote: > This diff converts the SRP list to a SMR list in rtsock.c > SRP is a bit strange with how it works and the SMR code is a bit easier to > understand. Since we can sleep in the SMR_TAILQ_FOREACH() we need to grab > a refcount on the

npppd(8): remove PIPEXCSESSION ioctl(2) command

2022-06-30 Thread Vitaliy Makkoveev
yasuoka@ remonded me, long time ago pipex(4) sessions can't be deleted until both input and output queues become empty: pipex_timer(void *ignored_arg) { /* ... */ switch (session->state) { /* ... */ case PIPEX_STATE_CLOSED:

Remove switch(4) remains from netisr.h

2022-06-29 Thread Vitaliy Makkoveev
Index: sys/net/netisr.h === RCS file: /cvs/src/sys/net/netisr.h,v retrieving revision 1.58 diff -u -p -r1.58 netisr.h --- sys/net/netisr.h29 Jun 2022 09:08:07 - 1.58 +++ sys/net/netisr.h29 Jun 2022 12:23:44 - @@

Re: pipex(4): use per-session `pxs_mtx' mutex(9) for protection

2022-06-28 Thread Vitaliy Makkoveev
On Tue, Jun 28, 2022 at 09:05:09PM +0300, Vitaliy Makkoveev wrote: > The updated diff. It was triggered by Hrvoje Popovski, we could do > direct (*if_qstart)() call in pipex(4) PPPOE input path, and this > provides deadlock. The updated diff uses ip{,6}_send() instead of > ipv{4,6}_i

Re: pipex(4): use per-session `pxs_mtx' mutex(9) for protection

2022-06-28 Thread Vitaliy Makkoveev
The updated diff. It was triggered by Hrvoje Popovski, we could do direct (*if_qstart)() call in pipex(4) PPPOE input path, and this provides deadlock. The updated diff uses ip{,6}_send() instead of ipv{4,6}_input(). r420-1# witness: acquiring duplicate lock of same type: ">pxs_mtx" 1st >pxs_mtx

pipex(4): use per-session `pxs_mtx' mutex(9) for protection

2022-06-28 Thread Vitaliy Makkoveev
We can't predict netlock state when we executing pipex(4) related (*if_qstart)() handlers, so we can't use netlock for pipex(4) protection. We already use `pipex_list_mtx' for global pipex(4) data, so Use per-session `pxs_mtx' mutex(9) for pipex session context protection. We already use

do pppoe(4) input through netisr

2022-06-28 Thread Vitaliy Makkoveev
ether_input() called with shared netlock, but pppoe(4) wants it to be exclusive. Do the pppoe(4) input within netisr handler with exclusive netlok held, and remove kernel lock hack from ether_input(). This is the step back, but it makes ether_input() path better then it is now. ok? Index:

Re: Remove switch(4) leftovers

2022-06-27 Thread Vitaliy Makkoveev
On Mon, Jun 27, 2022 at 01:44:55PM +, Visa Hankala wrote: > Remove some switch(4) leftovers. > > OK? > ok mvs > Index: etc/etc.hppa/MAKEDEV.md > === > RCS file: src/etc/etc.hppa/MAKEDEV.md,v > retrieving revision 1.68 > diff

Re: struct refcnt for routes

2022-06-27 Thread Vitaliy Makkoveev
On Mon, Jun 27, 2022 at 03:21:56PM +0200, Alexander Bluhm wrote: > Hi, > > Use ref count API for routes. > > ok? > Some time ago, I posted the diff which removes the logic around negative `refcnt' within rtfree(). This diff was stopped by mpi@, he said we could underflow rt->rt_refcnt and this

pipex(4): do pppoe output through netisr

2022-06-27 Thread Vitaliy Makkoveev
We can't predict netlock state when pipex(4) related (*if_qstart)() handlers called. This means we can't use netlock within pppac_qstart() and pppx_if_qstart() handlers. But actually we can't avoid netlock only when we call (*if_output)() in pipex(4) PPPOE output path. Introduce `pipexoutq'

Re: arp getuptime

2022-06-27 Thread Vitaliy Makkoveev
On Mon, Jun 27, 2022 at 01:58:11PM +0200, Alexander Bluhm wrote: > Hi, > > Instead of calling getuptime() all the time in ARP code, I would > like to do it only once per function. This should give us a more > consistent time value. > > ok? > ok mvs@ > bluhm > > Index: netinet/if_ether.c >

Re: Check `flags' on target session, not multicast session

2022-06-27 Thread Vitaliy Makkoveev
On Mon, Jun 27, 2022 at 01:29:26AM +0300, Vitaliy Makkoveev wrote: > We should check PIPEX_SFLAGS_IP{,6}_FORWARD bits on the session which we > will output packet, not on the dummy multicast session. > > npppd(8) clears these flags before release IP address assigned to > sess

pppx(4): don't output packets when no PIPEX_SFLAGS_IP{,6}_FORWARD flags are set

2022-06-26 Thread Vitaliy Makkoveev
npppd(8) clears these flags before it releases IP address assigned to pipex(4) session. This IP could be used for other session, so we should not process packets when these flags are not set. We do PIPEX_SFLAGS_IP{,6}_FORWARD flags check within pipex_ip_output() called by pppac_qstart(), but the

Check `flags' on target session, not multicast session

2022-06-26 Thread Vitaliy Makkoveev
We should check PIPEX_SFLAGS_IP{,6}_FORWARD bits on the session which we will output packet, not on the dummy multicast session. npppd(8) clears these flags before release IP address assigned to session. So this IP address could be assigned to other session while our session is still alive. We

Re: Don't take kernel lock on pipex(4) pppoe input

2022-06-26 Thread Vitaliy Makkoveev
On Mon, Jun 27, 2022 at 12:39:11AM +0300, Vitaliy Makkoveev wrote: > On Sun, Jun 26, 2022 at 11:26:13PM +0200, Jeremie Courreges-Anglas wrote: > > On Mon, Jun 27 2022, Vitaliy Makkoveev wrote: > > > On Sun, Jun 26, 2022 at 10:39:10PM +0200, Alexander Bluhm wrote: > > >&

Re: Don't take kernel lock on pipex(4) pppoe input

2022-06-26 Thread Vitaliy Makkoveev
On Sun, Jun 26, 2022 at 11:26:13PM +0200, Jeremie Courreges-Anglas wrote: > On Mon, Jun 27 2022, Vitaliy Makkoveev wrote: > > On Sun, Jun 26, 2022 at 10:39:10PM +0200, Alexander Bluhm wrote: > >> On Sun, Jun 26, 2022 at 07:45:57PM +0300, Vitaliy Makkoveev wrote: > >>

Re: Don't take kernel lock on pipex(4) pppoe input

2022-06-26 Thread Vitaliy Makkoveev
On Sun, Jun 26, 2022 at 10:39:10PM +0200, Alexander Bluhm wrote: > On Sun, Jun 26, 2022 at 07:45:57PM +0300, Vitaliy Makkoveev wrote: > > This extra serialization is not required. In packet processing path we > > have shared netlock held, but we do read-only access on per ses

Reset `idle_time' timeout only on opened pipex(4) sessions

2022-06-26 Thread Vitaliy Makkoveev
When pipex(4) session state is PIPEX_STATE_CLOSE_WAIT{,2} this means the session is already reached time to live timeout, and the garbage collector waits a little to before kill it. The meaning of `idle_time' has been changed. Don't reset `idle_time' timeout for such sessions in packet processing

Don't take kernel lock on pipex(4) pppoe input

2022-06-26 Thread Vitaliy Makkoveev
This extra serialization is not required. In packet processing path we have shared netlock held, but we do read-only access on per session `flags' and `ifindex'. We always modify them from ioctl(2) path with exclusive netlock held. The rest of pipex(4) session is immutable or uses per-session

Re: pipex(4): protect global lists with mutex(9)

2022-06-26 Thread Vitaliy Makkoveev
On Sun, Jun 26, 2022 at 02:37:41PM +0200, Alexander Bluhm wrote: > On Wed, Jun 22, 2022 at 02:46:32PM +0300, Vitaliy Makkoveev wrote: > > @@ -547,9 +547,11 @@ ether_input(struct ifnet *ifp, struct mb > > > > if ((session = pipex_pppoe_looku

Re: pipex(4): protect global lists with mutex(9)

2022-06-22 Thread Vitaliy Makkoveev
Please drop previous diff. counters_read(9) could sleep, so we can't call it with mutex(9) held. The diff below still uses `pipex_list_mtx' mutex(9) for pipex(4) lists protection, but for safe `session' dereference it user reference counters. Index: sys/net/if_ethersubr.c

Re: pipex(4): protect global lists with mutex(9)

2022-06-20 Thread Vitaliy Makkoveev
ote: > > On 20.6.2022. 16:46, Vitaliy Makkoveev wrote: >> And use this mutex(9) to make the `session' dereference safe. >> >> Hrvoje Popovski pointed me, he triggered netlock assertion with pipex(4) >> pppoe sessions: >> > > Hi, > > I can trigger thi

pipex(4): protect global lists with mutex(9)

2022-06-20 Thread Vitaliy Makkoveev
And use this mutex(9) to make the `session' dereference safe. Hrvoje Popovski pointed me, he triggered netlock assertion with pipex(4) pppoe sessions: > > r420-1# splassert: pipex_ppp_output: want 2 have 0 > Starting stack trace... > pipex_ppp_output(fd80cb7c9500,800022761200,21) at >

Re: ipsec acquire refcount fix

2022-06-14 Thread Vitaliy Makkoveev
> On 14 Jun 2022, at 15:47, Alexander Bluhm wrote: > > Hi, > > I made a little mistake when adding acquire refcount. The timeout > does not decrement the counter to 0 properly. > > We have one reference count for the lists, and one for the timeout > handler. When the timout fires, it has to

Re: potential memory leak in if_vinput()

2022-06-07 Thread Vitaliy Makkoveev
On Tue, Jun 07, 2022 at 02:23:07AM +0200, Alexandr Nedvedicky wrote: > Hello, > > I've spotted this glitch while hunting down use after-free > in 'veb' packet path. I believe the issue is rather hypothetical, > there is no evidence the deemed memory leak ever occurred. > > Anyway I believe the

pipex(4): convert bit fields to flags

2022-05-21 Thread Vitaliy Makkoveev
'pipex_mppe' and 'pipex_session' structures have uint16_t bit fields which represent flags. We mix unlocked access to immutable flags with protected access to mutable ones. This could be not MP independent on some architectures, so convert them fields to u_int `flags' variables. bluhm@ pointed

Re: kernel lock in arp

2022-05-21 Thread Vitaliy Makkoveev
This diff looks good, except the re-check after kernel lock. It’s supposed `rt’ could became inconsistent, right? But what stops to make it inconsistent after first unlocked RTF_LLINFO flag check? I think this re-check should gone. > On 21 May 2022, at 19:45, Hrvoje Popovski wrote: > > On

Re: kernel lock in arp

2022-05-18 Thread Vitaliy Makkoveev
> On 18 May 2022, at 18:31, Alexander Bluhm wrote: > > Hi, > > For parallel IP forwarding I had put kernel lock around arpresolve() > as a quick workaround for crashes. Moving the kernel lock inside > the function makes the hot path lock free. I see slight prerformace > increase in my test

Fine grained unix(4) sockets locking

2022-05-16 Thread Vitaliy Makkoveev
Hi, The diff below switches unix(4) sockets locking from one global `unp_lock' rwlock(9) to per-socket `so_lock' rwlock(9). I already posted this diff to tech@. Also this diff was tested in the snaps without any issue. This version was updated to be against the most recent sources. I want to

Re: Unlock umask(2)

2022-05-12 Thread Vitaliy Makkoveev
> On 13 May 2022, at 00:02, Alexander Bluhm wrote: > > On Wed, May 11, 2022 at 11:20:15AM +0300, Vitaliy Makkoveev wrote: >> sys_umask() only modifies `fd_cmask', which modification is already >> protected by `fd_lock' rwlock(9). > > I did run full regress on amd64

Re: Unlock umask(2)

2022-05-11 Thread Vitaliy Makkoveev
On Wed, May 11, 2022 at 02:33:34PM -0600, Theo de Raadt wrote: > Vitaliy Makkoveev wrote: > > > On Wed, May 11, 2022 at 01:22:51PM -0600, Theo de Raadt wrote: > > > Vitaliy Makkoveev wrote: > > > >

Re: Unlock umask(2)

2022-05-11 Thread Vitaliy Makkoveev
On Wed, May 11, 2022 at 01:22:51PM -0600, Theo de Raadt wrote: > Vitaliy Makkoveev wrote: > > > Index: sys/sys/filedesc.h > > === > > RCS file: /cvs/src/sys/sys/filedesc.h,v > > retrieving revis

Re: Unlock umask(2)

2022-05-11 Thread Vitaliy Makkoveev
On Wed, May 11, 2022 at 09:49:34AM -0600, Theo de Raadt wrote: > Alexander Bluhm wrote: > > > On Wed, May 11, 2022 at 11:20:15AM +0300, Vitaliy Makkoveev wrote: > > > sys_umask() only modifies `fd_cmask', which modification is already > > > protected by `fd_loc

Unlock umask(2)

2022-05-11 Thread Vitaliy Makkoveev
sys_umask() only modifies `fd_cmask', which modification is already protected by `fd_lock' rwlock(9). Index: sys/kern/syscalls.master === RCS file: /cvs/src/sys/kern/syscalls.master,v retrieving revision 1.223 diff -u -p -r1.223

Re: divert packet kernel lock

2022-05-06 Thread Vitaliy Makkoveev
sbappendaddr() has no sleep points, so this should work. I have no objections to commit this as quick and dirty fix. Otherwise the network stack parallelisation diff should be reverted. > On 6 May 2022, at 15:48, Alexander Bluhm wrote: > > On Thu, May 05, 2022 at 11:10:54PM +0200, Mark Kettenis

Re: Add refcnt_read()

2022-03-15 Thread Vitaliy Makkoveev
On Tue, Mar 15, 2022 at 09:11:30AM +, Visa Hankala wrote: > This patch adds a function for getting a snapshot of a reference > counter. This will let code like crcopy() access the value in an > API-observing way. > > OK? ok mvs@ > > Index: share/man/man9/refcnt_init.9 >

Re: [External] : ipsec acquire mutex refcount

2022-03-14 Thread Vitaliy Makkoveev
> On 15 Mar 2022, at 00:45, Alexandr Nedvedicky > wrote: > > Hello, > > On Tue, Mar 15, 2022 at 12:37:00AM +0300, Vitaliy Makkoveev wrote: >> Hi, >> >> Why do you want to initialize `ipa’ variable in >> ipsp_pending_acquire() and ipsec_get_acquire

Re: [External] : ipsec acquire mutex refcount

2022-03-14 Thread Vitaliy Makkoveev
Hi, Why do you want to initialize `ipa’ variable in ipsp_pending_acquire() and ipsec_get_acquire()? This doesn’t require. > On 14 Mar 2022, at 13:09, Alexandr Nedvedicky > wrote: > > Hello, > > > > changes looks good. just few nits. > >I took a closer look at ipsec_delete_policy(): >

Re: multicast malloc

2022-03-04 Thread Vitaliy Makkoveev
ok mvs@ > On 4 Mar 2022, at 19:05, Alexander Bluhm wrote: > > Hi, > > in_addmulti() is only called from ioctl(2) or setsockopt(2). Wait > for malloc(9) to make the system call reliable. > > ok? > > bluhm > > Index: netinet/in.c >

Re: pcb init malloc

2022-03-04 Thread Vitaliy Makkoveev
ok mvs@ > On 4 Mar 2022, at 18:32, Alexander Bluhm wrote: > > Hi, > > in_pcbinit() is called during boot. There malloc(9) cannot fail. > If called with M_WAITOK it would panic otherwise. So this error > handling is needles. > > ok? > > bluhm > > Index: netinet/in_pcb.c >

Re: in6_pcbnotify void

2022-03-01 Thread Vitaliy Makkoveev
On Wed, Mar 02, 2022 at 12:47:11AM +0100, Alexander Bluhm wrote: > Hi, > > The return value of in6_pcbnotify() is never used. Make it a void > function. > > ok? > ok mvs@ > bluhm > > Index: netinet/in_pcb.h > === > RCS file:

Re: Unlock getsockname(2) syscall

2022-02-23 Thread Vitaliy Makkoveev
On Wed, Feb 23, 2022 at 11:04:40PM +0100, Alexander Bluhm wrote: > On Thu, Feb 24, 2022 at 12:02:55AM +0300, Vitaliy Makkoveev wrote: > > ping... > > I have tested it on my i386 regress machine, it passed, and then I > forgot. > > OK bluhm@ > Thanks!

Re: Unlock getsockname(2) syscall

2022-02-23 Thread Vitaliy Makkoveev
ping... > On 10 Feb 2022, at 11:25, Vitaliy Makkoveev wrote: > > For inet and UNIX sockets it fills passed 'sockaddr' structure with > socket's address. For key management and route domain sockets it just > returns error. > > ok? > > Inde

Unlock getsockname(2) syscall

2022-02-10 Thread Vitaliy Makkoveev
For inet and UNIX sockets it fills passed 'sockaddr' structure with socket's address. For key management and route domain sockets it just returns error. ok? Index: sys/kern/syscalls.master === RCS file:

Re: unlock mmap(2) for anonymous mappings

2022-01-11 Thread Vitaliy Makkoveev
Does it make sense to document that kernel lock protects `ps_wxcounter’? We have such [K] in other structure definitions. + u_int64_t ps_wxcounter; /* [K] number of W^X violations */ The rest of diff looks good to me. > On 12 Jan 2022, at 02:13, Klemens Nanni wrote: > > On Tue,

Re: unlock mmap(2) for anonymous mappings

2022-01-11 Thread Vitaliy Makkoveev
On Tue, Jan 11, 2022 at 08:15:13AM +, Klemens Nanni wrote: > On Mon, Jan 10, 2022 at 12:06:44PM +, Klemens Nanni wrote: > > On Fri, Dec 31, 2021 at 07:54:53PM +0300, Vitaliy Makkoveev wrote: > > > The uvm_wxabort path within uvm_wxcheck() looks not MP-safe. > >

Unlock getpeername(2)

2022-01-03 Thread Vitaliy Makkoveev
Subj. The getpeername(2) sysckall is pretty simple. For inet and unix sockets it follows the code which was already unlocked with accept(2) unlocking. Just copy the 'sockaddr' structure containing the peer address. For key management and route domain sockets it just copies the read-only global

Re: unlock mmap(2) for anonymous mappings

2021-12-31 Thread Vitaliy Makkoveev
The uvm_wxabort path within uvm_wxcheck() looks not MP-safe. > On 31 Dec 2021, at 12:14, Klemens Nanni wrote: > > Now that mpi has unlocked uvm's fault handler, we can unlock the mmap > syscall to handle MAP_ANON without the big lock. > > sys_mmap() still protects the !MAP_ANON case, i.e. file

Re: parallel ip forwarding

2021-12-30 Thread Vitaliy Makkoveev
> On 30 Dec 2021, at 11:28, YASUOKA Masahiko wrote: > > Hi, > > On Sat, 25 Dec 2021 21:50:47 +0300 > Vitaliy Makkoveev wrote: >> On Fri, Dec 24, 2021 at 12:50:23PM +0100, Alexander Bluhm wrote: >>> On Fri, Dec 24, 2021 at 04:16:28PM +0900, YASUOKA M

Re: parallel ip forwarding

2021-12-25 Thread Vitaliy Makkoveev
On Fri, Dec 24, 2021 at 12:50:23PM +0100, Alexander Bluhm wrote: > On Fri, Dec 24, 2021 at 04:16:28PM +0900, YASUOKA Masahiko wrote: > > > - npppd l2pt ipsecflowinfo is not MP safe > > > > Does this mean the things we are discussing on the "Fix > > ipsp_spd_lookup() for transport mode" thread? I

Re: ip_deliver without kernel lock

2021-12-24 Thread Vitaliy Makkoveev
ok mvs@ > On 25 Dec 2021, at 01:17, Alexander Bluhm wrote: > > Hi, > > ip_deliver() has been called without kernel lock from ip_ours() and > ip6_ours() for a long time. It looks like these two callers in ip6 > input were forgotten to be unlocked. > > ok? > > bluhm > > Index:

Re: parallel ip forwarding

2021-12-24 Thread Vitaliy Makkoveev
`rtt_link’ list also has missing protection when rtfree() called outside netlock. > On 25 Dec 2021, at 01:34, Alexander Bluhm wrote: > > On Fri, Dec 24, 2021 at 02:04:17PM +0100, Alexander Bluhm wrote: >> On Fri, Dec 24, 2021 at 12:55:04AM +0100, Alexander Bluhm wrote: >>> If you use only

Re: ipsec tdb walk

2021-12-19 Thread Vitaliy Makkoveev
On Sat, Dec 18, 2021 at 01:07:00AM +0100, Alexander Bluhm wrote: > Hi, > > There are occasions where the walker in tdb_walk() might sleep. > Case SADB_DUMP is such a case. And mvs@ has a diff that sleeps to > read the counters. So holding the tdb_sadb_mtx() when calling > walker() is not

Re: Per-cpu counters for TDB stats

2021-12-15 Thread Vitaliy Makkoveev
Please drop this diff. > On 15 Dec 2021, at 19:21, Vitaliy Makkoveev wrote: > > The previous version of this diff exposed UAF issue we had after > tdb_delete(). bluhm@ fixed this and proposes to put per-cpu counters > diff again to tree. This is the updated diff to be aga

Per-cpu counters for TDB stats

2021-12-15 Thread Vitaliy Makkoveev
The previous version of this diff exposed UAF issue we had after tdb_delete(). bluhm@ fixed this and proposes to put per-cpu counters diff again to tree. This is the updated diff to be against the resent sources. Index: sys/net/pfkeyv2_convert.c

Re: syskaller igmp leavegroup

2021-12-14 Thread Vitaliy Makkoveev
ok mvs@ > On 14 Dec 2021, at 16:49, Alexander Bluhm wrote: > > Hi, > > syzkaller found a NULL dereference: > > https://syzkaller.appspot.com/bug?id=a7f159c677ec125fe9edef2265e2749f13e24243 > > It looks like inm->inm_rti is NULL. It is set in rti_fill() or not > set if malloc(9) fails.

Re: ipsec ipo tdb mutex

2021-12-14 Thread Vitaliy Makkoveev
ok mvs@ > On 11 Dec 2021, at 22:03, Alexander Bluhm wrote: > > On Sat, Dec 11, 2021 at 12:53:35AM +0100, Alexander Bluhm wrote: >> To cache lookups, the policy ipo is linked to its SA tdb. There >> is a list of SAs that belong to a policy. To make it MP safe we >> need a mutex around these

Re: IPsec TDB locking

2021-12-08 Thread Vitaliy Makkoveev
ok mvs@ > On 8 Dec 2021, at 16:41, Alexander Bluhm wrote: > > On Wed, Dec 08, 2021 at 02:41:47AM +0300, Vitaliy Makkoveev wrote: >> [IN] looks strange. If this field modified after creation it is >> mutable. There are cases when such field could modified only once, >

Re: IPsec TDB locking

2021-12-08 Thread Vitaliy Makkoveev
, at 08:56, Visa Hankala wrote: > > On Wed, Dec 08, 2021 at 02:41:47AM +0300, Vitaliy Makkoveev wrote: >> [IN] looks strange. If this field modified after creation it is >> mutable. There are cases when such field could modified only once, >> but it still is atomic. >> >

Re: IPsec TDB locking

2021-12-07 Thread Vitaliy Makkoveev
[IN] looks strange. If this field modified after creation it is mutable. There are cases when such field could modified only once, but it still is atomic. We have cases where we do assignment only once, like `unp_addr’ when we bind(2)ing socket and we don’t modify if until socket’s destruction.

Re: IPsec delete TDB in ipo cache

2021-12-07 Thread Vitaliy Makkoveev
> On 7 Dec 2021, at 18:31, Alexander Bluhm wrote: > > Hi, > > In ipo_tdb the flow contains a reference counted TDB cache. This > may prevent that tdb_free() is called. It is not a real leak as > ipsecctl -F or termination of iked flush this cache. The kernel > does the cleanup itself if we

Re: Rework UNIX sockets locking to be fine grained

2021-12-06 Thread Vitaliy Makkoveev
Updated on top of latest source. Index: sys/kern/uipc_socket.c === RCS file: /cvs/src/sys/kern/uipc_socket.c,v retrieving revision 1.269 diff -u -p -r1.269 uipc_socket.c --- sys/kern/uipc_socket.c 11 Nov 2021 16:35:09 -

Re: parallel ip forwarding

2021-12-03 Thread Vitaliy Makkoveev
On Fri, Dec 03, 2021 at 08:35:45PM +0100, Alexander Bluhm wrote: > Hi, > > After implementing MP safe refcounting for IPsec TDB, I wonder how > stable my diff for forwarding on multiple CPU is. > > Note that IPsec still has the workaround to disable multiple queues. > We do not have a mutex that

Re: tdb_delete_locked for pfkey

2021-12-03 Thread Vitaliy Makkoveev
ok mvs@ > On 3 Dec 2021, at 20:25, Tobias Heider wrote: > > Hi, > > the diff below adds tdb_delete_locked() for use in pfkeyv2_sa_flush(). > This way we won't have to worry about keeping the inline code and > tdb_delete() in sync. > > ok? > > Index: net/pfkeyv2.c >

Re: Rework UNIX sockets locking to be fine grained

2021-12-02 Thread Vitaliy Makkoveev
Thanks! Bulk build make us sure all packages have been built with new headers. > On 2 Dec 2021, at 16:26, Christian Weisgerber wrote: > > Vitaliy Makkoveev: > >> Include missing "sys/refcnt.h" header to unpcb.h to fix libkvm and >> netstat(1) build. No funct

Re: Rework UNIX sockets locking to be fine grained

2021-12-01 Thread Vitaliy Makkoveev
This diff included into the snaps. > On 30 Nov 2021, at 19:48, Vitaliy Makkoveev wrote: > > Include missing "sys/refcnt.h" header to unpcb.h to fix libkvm and > netstat(1) build. No functional changes. > > I

Re: Rework UNIX sockets locking to be fine grained

2021-12-01 Thread Vitaliy Makkoveev
On Wed, Dec 01, 2021 at 12:55:06PM +0100, Martin Pieuchot wrote: > On 24/11/21(Wed) 15:04, Vitaliy Makkoveev wrote: > > [...] > > Really, this is the simplest way. The shared lock for the pair of > > sockets moves re-lock dances to the connect and disconnect stages whi

Re: ipsp_spd_lookup error

2021-12-01 Thread Vitaliy Makkoveev
On Wed, Dec 01, 2021 at 12:18:30AM +0100, Alexander Bluhm wrote: > Hi, > > I want to ref count the TDB that is returned by ipsp_spd_lookup(). > > Sometimes the TDB returned by ipsp_spd_lookup() is not used. So > refcounting that, does not make sense. As a first step I want to > convert

Re: Rework UNIX sockets locking to be fine grained

2021-11-30 Thread Vitaliy Makkoveev
Include missing "sys/refcnt.h" header to unpcb.h to fix libkvm and netstat(1) build. No functional changes. Index: sys/kern/uipc_socket.c === RCS file: /cvs/src/sys/kern/uipc_socket.c,v retrieving revision 1.269 diff -u -p -r1.269

Re: Fix ipsp_spd_lookup() for transport mode

2021-11-30 Thread Vitaliy Makkoveev
Hi, This question is mostly for bluhm@. Should the gettdbbyflow() grab the extra reference on returned `tdbp' like other other gettdb*() do? I'm pointing this because we are going to not rely on the netlock when doing `tdbp' dereference. Also could this block be rewritten? It looks a little

Re: ipsec: refactor TDBF_DELETED

2021-11-25 Thread Vitaliy Makkoveev
ok mvs@ > On 26 Nov 2021, at 01:37, Tobias Heider wrote: > > On Fri, Nov 26, 2021 at 01:17:22AM +0300, Vitaliy Makkoveev wrote: >> On Thu, Nov 25, 2021 at 10:59:25PM +0100, Alexander Bluhm wrote: >>> On Thu, Nov 25, 2021 at 05:13:16PM +0100, Tobias Heider wrote: >>

Re: ipsec: refactor TDBF_DELETED

2021-11-25 Thread Vitaliy Makkoveev
On Thu, Nov 25, 2021 at 10:59:25PM +0100, Alexander Bluhm wrote: > On Thu, Nov 25, 2021 at 05:13:16PM +0100, Tobias Heider wrote: > > Now with the missing parts from pfkeyv2.c as noticed by Hrvoje. > > We have this code in pfkeyv2_send() > > if

Re: IPsec tdb ref counting

2021-11-25 Thread Vitaliy Makkoveev
On Thu, Nov 25, 2021 at 09:52:54AM +0100, Alexander Bluhm wrote: > On Sat, Nov 13, 2021 at 06:04:07PM +0100, Alexander Bluhm wrote: > > When testing, please check for tdb leaks. > > The diff below was running on my performance setup for more than > 10 hours. iked SA lifetime was about 10

Re: IPsec tdb ref counting

2021-11-24 Thread Vitaliy Makkoveev
> On 24 Nov 2021, at 22:14, Tobias Heider wrote: > > On Wed, Nov 24, 2021 at 03:52:26PM +0100, Alexander Bluhm wrote: >> On Wed, Nov 24, 2021 at 05:12:36PM +0300, Vitaliy Makkoveev wrote: >>> Understood. But his means we encoded double unref when we calling >>>

Re: IPsec tdb ref counting

2021-11-24 Thread Vitaliy Makkoveev
It was not clean for me because we has no extra reference for parallel tdb_delete(). Now I understand how it should work and why ‘TDBF_DELETED’ fixed this. Thanks for explanation. > On 24 Nov 2021, at 17:52, Alexander Bluhm wrote: > > On Wed, Nov 24, 2021 at 05:12:36PM +0300, Vitaliy

Re: IPsec tdb ref counting

2021-11-24 Thread Vitaliy Makkoveev
On Wed, Nov 24, 2021 at 02:12:15PM +0100, Alexander Bluhm wrote: > On Wed, Nov 24, 2021 at 02:48:06AM +0300, Vitaliy Makkoveev wrote: > > >> ddb{2}> show panic > > >> *cpu2: pool_do_get: tdb free list modified: page 0x8801; > > >> item ad

Re: Rework UNIX sockets locking to be fine grained

2021-11-24 Thread Vitaliy Makkoveev
On Wed, Nov 24, 2021 at 11:36:22AM +0100, Martin Pieuchot wrote: > On 22/11/21(Mon) 14:42, Vitaliy Makkoveev wrote: > > On Sat, Nov 20, 2021 at 03:12:31AM +0300, Vitaliy Makkoveev wrote: > > > Updated diff. Re-lock dances were simplified in the unix(4) sockets > > > la

Re: IPsec tdb ref counting

2021-11-23 Thread Vitaliy Makkoveev
> On 23 Nov 2021, at 18:16, Tobias Heider wrote: > > On Tue, Nov 23, 2021 at 02:18:26PM +0100, Alexander Bluhm wrote: >> On Tue, Nov 23, 2021 at 06:54:59AM +0100, Hrvoje Popovski wrote: >>> after 24 hours hitting sasyncd setup one box panic >> >> Thanks for testing. >> >> I have reduced my

Re: IPsec tdb ref counting

2021-11-23 Thread Vitaliy Makkoveev
On Tue, Nov 23, 2021 at 06:54:59AM +0100, Hrvoje Popovski wrote: > On 21.11.2021. 23:36, Alexander Bluhm wrote: > > Updated tdb refcounting diff after merging with mvs@'s commit. > > Hi, > > after 24 hours hitting sasyncd setup one box panic > > r620-2# panic: pool_do_get: tdb free list

Unlock accept(2) and accept4(2) syscalls

2021-11-22 Thread Vitaliy Makkoveev
Since the rev1.267 of kern/uipc_socket solock() is used as klist lock for sockets to make socket event filters MP-safe. This means KNOTE(9) within doaccept() doesn't require kernel lock to be held and the accept(2) and accept4(2) syscalls could be unlocked. This makes sense because all our sockets

Re: Rework UNIX sockets locking to be fine grained

2021-11-22 Thread Vitaliy Makkoveev
On Sat, Nov 20, 2021 at 03:12:31AM +0300, Vitaliy Makkoveev wrote: > Updated diff. Re-lock dances were simplified in the unix(4) sockets > layer. > > Reference counters added to unix(4) sockets layer too. This makes > pointer dereference of peer's control block always saf

Re: Expose TDBs with excess hardlimit in ipsec(4) statistics

2021-11-20 Thread Vitaliy Makkoveev
> On 21 Nov 2021, at 04:14, Alexander Bluhm wrote: > > On Sun, Nov 21, 2021 at 02:56:58AM +0300, Vitaliy Makkoveev wrote: >> Also I found the `ipsec_notdb' counter description in header doesn't >> math to netstat(1) description. We never count `ipsec_notdb' and the >

Re: Rework UNIX sockets locking to be fine grained

2021-11-20 Thread Vitaliy Makkoveev
On Sat, Nov 20, 2021 at 07:16:40PM +0100, Matthias Schmidt wrote: > Hi Vitaliy, > > * Vitaliy Makkoveev wrote: > > Updated diff. Re-lock dances were simplified in the unix(4) sockets > > layer. > > I am running this diff and the one before on a Thinkpad T450s which i

Re: IPsec tdb ref counting

2021-11-20 Thread Vitaliy Makkoveev
On Sat, Nov 20, 2021 at 05:31:33PM +0100, Alexander Bluhm wrote: > New tdb refcounting diff. > > I delete and unref the timeouts earlier and fixed some leaks. At > least on my machine it does not crash and tcp InUse is 0 after > ipsecctl -F . > > Please test. > > mvs: Are some of your

Expose TDBs with excess hardlimit in ipsec(4) statistics

2021-11-20 Thread Vitaliy Makkoveev
When I had playing with bluhm@'s ipsec(4) diffs I found my system always manages to complete rekeying and I have no TDB's with hardlimit excess. It seems I'm not the only such person and other testers also don't have panics Hrvoje Popovski had. I want to count the TDBs with hardlimit excess as

Re: Rework UNIX sockets locking to be fine grained

2021-11-19 Thread Vitaliy Makkoveev
Updated diff. Re-lock dances were simplified in the unix(4) sockets layer. Reference counters added to unix(4) sockets layer too. This makes pointer dereference of peer's control block always safe after re-lock. The `unp_refs' list cleanup done in the unp_detach(). This removes the case where

Re: IPsec tdb ref counting

2021-11-19 Thread Vitaliy Makkoveev
I want to note all the people who testing this diff. Please be sure your test exceeds the lifetime of the `tdb’ and some rekeying cycles have been made. > On 19 Nov 2021, at 20:09, Alexander Bluhm wrote: > > Hi, > > I got a new hardware in my testlab and could panic this machine > easily with

Rework UNIX sockets locking to be fine grained

2021-11-18 Thread Vitaliy Makkoveev
The UNIX sockets garbage collector was moved out of global `unp_lock' rwlock(9) which locks the whole layer. Now it's used to protect per-socket data and it's time to replace it to per-socket's `so_lock'. Unlike PF_ROUTE and PF_KEY sockets we have the paths where multiple sockets should be locked

Re: IPsec tdb ddb print

2021-11-16 Thread Vitaliy Makkoveev
On Mon, Nov 15, 2021 at 05:23:43PM +0100, Alexander Bluhm wrote: > Hi, > > To debug IPsec and tdb refcounting it may be useful to have "show > tdb" and "show all tdbs" in ddb. > > ok? > Indeed this is useful. ok mvs@ > bluhm > > Index: share/man/man4/ddb.4 >

Re: IPsec tdb ref counting

2021-11-15 Thread Vitaliy Makkoveev
On Sun, Nov 14, 2021 at 10:50:34PM +0100, Alexander Bluhm wrote: > On Sat, Nov 13, 2021 at 06:04:07PM +0100, Alexander Bluhm wrote: > > It passes regress but there are setups that are not covered. Bridge > > and pfsync with IPsec and TCP signature need special care. > > > > When testing, please

Re: IPsec tdb ref counting

2021-11-15 Thread Vitaliy Makkoveev
On Mon, Nov 15, 2021 at 02:51:16PM +0100, Hrvoje Popovski wrote: And you don'n see "> tdb_free() killing ourself" in dmesg output? > On 15.11.2021. 13:11, Vitaliy Makkoveev wrote: > > Hi, > > > > Could you try this diff? It should

Re: IPsec tdb ref counting

2021-11-15 Thread Vitaliy Makkoveev
On Mon, Nov 15, 2021 at 12:35:13PM +0100, Hrvoje Popovski wrote: > On 14.11.2021. 22:50, Alexander Bluhm wrote: > > New diff with fix from mvs@. Please continue testing with this one. > > Hi, > > i've applied this diff on sasyncd setup with two ipsec sessions and i'm > getting this panic. Box

Re: UNIX sockets: move garbage collector data out from `unp_lock'

2021-11-15 Thread Vitaliy Makkoveev
ping... On Sat, Nov 13, 2021 at 07:20:23PM +0300, Vitaliy Makkoveev wrote: > On Fri, Nov 12, 2021 at 03:28:42AM +0300, Vitaliy Makkoveev wrote: > > The final step before rework UNIX sockets to fine grained locks. Except > > `unp_ino' this leaves only per-socket data protect

Re: IPsec tdb ref counting

2021-11-13 Thread Vitaliy Makkoveev
Hi, Do you have panics with this diff? Index: sys/net/if_bridge.c === RCS file: /cvs/src/sys/net/if_bridge.c,v retrieving revision 1.358 diff -u -p -r1.358 if_bridge.c --- sys/net/if_bridge.c 11 Nov 2021 18:08:17 - 1.358

Re: IPsec tdb ref counting

2021-11-13 Thread Vitaliy Makkoveev
On Sat, Nov 13, 2021 at 09:49:31PM +, Stuart Henderson wrote: > On 2021/11/13 18:04, Alexander Bluhm wrote: > > Hi, > > > > To make IPsec MP safe we need refcounting for the tdb. The diff > > below is part of something bigger we have at genua. Although it > > does not cover timeouts and the

Re: UNIX sockets: move garbage collector data out from `unp_lock'

2021-11-13 Thread Vitaliy Makkoveev
On Fri, Nov 12, 2021 at 03:28:42AM +0300, Vitaliy Makkoveev wrote: > The final step before rework UNIX sockets to fine grained locks. Except > `unp_ino' this leaves only per-socket data protected by `unp_lock'. The > `unp_ino' protection is not the big deal and will be done wit

UNIX sockets: move garbage collector data out from `unp_lock'

2021-11-11 Thread Vitaliy Makkoveev
The final step before rework UNIX sockets to fine grained locks. Except `unp_ino' this leaves only per-socket data protected by `unp_lock'. The `unp_ino' protection is not the big deal and will be done with mutex(9) in the future diff. The `unp_gc_lock' rwlock(9) protects `unp_defer',

  1   2   3   4   5   >