On Thu, Mar 31, 2022 at 6:19 AM Waldek Kozaczuk <[email protected]>
wrote:

> BTW I have run more tests to see any performance gains and I failed to
> notice any significant differences. So maybe atomics are not that expensive
> in this case. But I think my proposal still stands.
>
> I also noticed that if I bump the number of iterations to over 100K at
> some point the client fails to connect with EADDRNOTAVAIL on both x64 and
> aarch64:
>

I'm not familiar with this test, but this sort of failure often happens
because of the TIME_WAIT state, which prevents reuse of the same client and
server port pair by another connection within a timeout (2 minutes?).
Maybe SO_REUSEADDR will help here?


>
> in_pcb.cc: EADDRNOTAVAIL: 0, last - first:55535
> tst-tcp1: client: connect() failed after i=111840!
> [tst-tcp1: client]: Address not available
>
> I pinpointed it to this fragment of in_pcb_lport() in
> bsd/sys/netinet/in_pcb.cc (see the lines 605 and 606 in bold):
>
> 598         if (dorandom)
>  599                 *lastport = first + (arc4random() % (last - first));
>  600
>  601         count = last - first;
>  602
>  603         do {
>  604                 if (count-- < 0)        /* completely used? */ {
>
> *605                         printf("in_pcb.cc: EADDRNOTAVAIL: 0, last -
> first:%ld\n", last - first); 606                         return
> (EADDRNOTAVAIL);*
>  607                 }
>  608                 ++*lastport;
>  609                 if (*lastport < first || *lastport > last)
>  610                         *lastport = first;
>  611                 lport = htons(*lastport);
>  612
>  613 #ifdef INET6
>  614                 if ((inp->inp_vflag & INP_IPV6) != 0)
>  615                         tmpinp = in6_pcblookup_local(pcbinfo,
>  616                             &inp->in6p_laddr, lport, lookupflags,
> cred);
>  617 #endif
>  618 #if defined(INET) && defined(INET6)
>  619                 else
>  620 #endif
>  621 #ifdef INET
>  622                         tmpinp = in_pcblookup_local(pcbinfo, laddr,
>  623                             lport, lookupflags, cred);
>  624 #endif
>  625         } while (tmpinp != NULL);
>  626
>  627 #ifdef INET
>  628         if ((inp->inp_vflag & (INP_IPV4|INP_IPV6)) == INP_IPV4)
>  629                 laddrp->s_addr = laddr.s_addr;
>  630 #endif
>  631         *lportp = lport;
>  632
>  633         return (0);
>  634 }
>
> I wonder if we somehow ran out of available ports because maybe we do not
> clean/make them available fast enough?
>
> On Monday, March 28, 2022 at 9:09:04 PM UTC-4 Waldek Kozaczuk wrote:
>
>> Hi,
>>
>> As I was working on a patch to address the issue
>> https://github.com/cloudius-systems/osv/issues/1087, I discovered that
>> occasionally it would crash or hang while running a test
>> tst-bsd-tcp1-zsndrcv.cc. The crash would look like this:
>>
>> page fault outside application, addr: 0x0000000000000000
>> [registers]
>> PC: 0x0000000fc011ca5c <ifa_ifwithnet+152>
>> X00: 0x0000000000000012 X01: 0xffff90004088f4a0 X02: 0x0000000000000000
>> X03: 0xffff800040f668c0 X04: 0xffffa00040e778e8 X05: 0xffff800040f66950
>> X06: 0x0000000000000000 X07: 0x0000000000000000 X08: 0x0000000000000000
>> X09: 0x0000000000000000 X10: 0x0000000000000000 X11: 0x0000000000000000
>> X12: 0x0000000000004000 X13: 0x0000000000000000 X14: 0x00000000000000b0
>> X15: 0x0000000000000000 X16: 0x0000000000000001 X17: 0x0000000000000001
>> X18: 0x0000000055697ee1 X19: 0xffffa0004088e800 X20: 0xffff800040f66840
>> X21: 0xffff90004088f000 X22: 0x0000000fc0672838 X23: 0x0000000000000000
>> X24: 0x0000000000000000 X25: 0xffff90004088f4a0 X26: 0x0000000000000001
>> X27: 0x00000000ffffffff X28: 0x0000000000000002 X29: 0xffff800040f667a0
>> X30: 0x0000000fc011ca2c SP:  0xffff800040f667a0 ESR: 0x0000000096000006
>> PSTATE: 0x0000000020000345
>> Aborted
>>
>> [backtrace]
>> 0x0000000fc014c2e4 <mmu::vm_fault(unsigned long, exception_frame*)+724>
>> 0x0000000fc017e4c8 <page_fault+100>
>> 0x0000000fc017e254 <???+-1072176556>
>> 0x0000000fc01234dc <???+-1072548644>
>> 0x0000000fc0124a80 <in_pcbconnect_setup+192>
>> 0x0000000fc0145920 <???+-1072408288>
>> 0x0000000fc0145afc <???+-1072407812>
>> 0x0000000fc0100bb8 <kern_connect+120>
>> 0x0000000fc010de7c <linux_connect+60>
>> 0x0000000fc010233c <connect+12>
>> 0x00001000000376a0 <???+226976>
>> 0x00001000000377e0 <???+227296>
>> 0x0000000fc0257440 <thread_main_c+32>
>> 0x0000000fc017d784 <???+-1072179324>
>>
>> ---------
>> (gdb) bt
>> #0  processor::wfi () at arch/aarch64/processor.hh:52
>> #1  processor::halt_no_interrupts () at arch/aarch64/processor.hh:52
>> #2  arch::halt_no_interrupts () at arch/aarch64/arch.hh:48
>> #3  osv::halt () at arch/aarch64/power.cc:24
>> #4  0x0000000fc00e9fcc in abort (fmt=fmt@entry=0xfc0481190 "Aborted\n")
>> at runtime.cc:142
>> #5  0x0000000fc00a0654 in abort () at runtime.cc:106
>> #6  0x0000000fc014c2e8 in mmu::vm_sigsegv (ef=0xffff800040f66680, addr=0)
>> at core/mmu.cc:1420
>> #7  mmu::vm_sigsegv (ef=0xffff800040f66680, addr=0) at core/mmu.cc:1414
>> #8  mmu::vm_fault (addr=0, addr@entry=1, ef=ef@entry=0xffff800040f66680)
>> at core/mmu.cc:1442
>> #9  0x0000000fc017e4cc in page_fault (ef=0xffff800040f66680) at
>> arch/aarch64/mmu.cc:50
>> #10 <signal handler called>
>> #11 0x0000000fc011ca5c in ifa_ifwithnet (addr=addr@entry=0xffff800040f66840,
>> ignore_ptp=0) at bsd/sys/net/if.cc:1434
>> #12 0x0000000fc01234e0 in in_pcbladdr (inp=inp@entry=0xffffa000411c1600,
>> faddr=faddr@entry=0xffff800040f669d8, laddr=laddr@entry=0xffff800040f669d0,
>> cred=0x0)
>>     at bsd/sys/netinet/in_pcb.cc:969
>> #13 0x0000000fc0124a84 in in_pcbconnect_setup 
>> (inp=inp@entry=0xffffa000411c1600,
>> nam=nam@entry=0xffffa00040b046d0, laddrp=laddrp@entry=0xffff800040f66a30,
>>
>>     lportp=lportp@entry=0xffff800040f66a2e, 
>> faddrp=faddrp@entry=0xffffa000411c1664,
>> fportp=fportp@entry=0xffffa000411c1654, oinpp=oinpp@entry=0xffff800040f66a38,
>>
>>     cred=cred@entry=0x0) at bsd/sys/netinet/in_pcb.cc:1056
>> #14 0x0000000fc0145924 in tcp_connect (tp=tp@entry=0xffffa00040e5e400,
>> nam=nam@entry=0xffffa00040b046d0, td=<optimized out>) at
>> bsd/sys/netinet/tcp_usrreq.cc:1089
>> #15 0x0000000fc0145b00 in tcp_usr_connect (td=<optimized out>,
>> nam=0xffffa00040b046d0, so=<optimized out>) at
>> bsd/sys/netinet/tcp_usrreq.cc:463
>> #16 tcp_usr_connect (so=<optimized out>, nam=0xffffa00040b046d0,
>> td=<optimized out>) at bsd/sys/netinet/tcp_usrreq.cc:436
>> #17 0x0000000fc0100bbc in kern_connect (fd=fd@entry=4,
>> sa=0xffffa00040b046d0) at bsd/sys/kern/uipc_syscalls.cc:377
>> #18 0x0000000fc010de80 in linux_connect (s=s@entry=4, 
>> name=name@entry=0xffff800040f66be8,
>> namelen=namelen@entry=16) at bsd/sys/compat/linux/linux_socket.cc:684
>> #19 0x0000000fc0102340 in connect (fd=fd@entry=4, 
>> addr=addr@entry=0xffff800040f66be8,
>> len=len@entry=16) at bsd/sys/kern/uipc_syscalls_wrap.cc:128
>> #20 0x00001000000376a4 in test_bsd_tcp1::tcp_client (this=0x2000001ffa98)
>> at /home/wkozaczuk/projects/osv/tests/tst-bsd-tcp1-zrcv.cc:169
>> #21 0x00001000000377e4 in
>> test_bsd_tcp1::run()::{lambda()#2}::operator()() const
>> (__closure=<optimized out>)
>>     at /home/wkozaczuk/projects/osv/tests/tst-bsd-tcp1-zrcv.cc:230
>> #22 std::__invoke_impl<void,
>> test_bsd_tcp1::run()::{lambda()#2}&>(std::__invoke_other,
>> test_bsd_tcp1::run()::{lambda()#2}&) (__f=...)
>>     at /usr/include/c++/10/bits/invoke.h:60
>> #23 std::__invoke_r<void,
>> test_bsd_tcp1::run()::{lambda()#2}&>(std::__is_invocable&&,
>> (test_bsd_tcp1::run()::{lambda()#2}&)...) (__fn=...)
>>     at /usr/include/c++/10/bits/invoke.h:153
>> #24 std::_Function_handler<void (),
>> test_bsd_tcp1::run()::{lambda()#2}>::_M_invoke(std::_Any_data const&)
>> (__functor=...)
>>     at /usr/include/c++/10/bits/std_function.h:291
>> #25 0x0000000fc0257444 in sched::thread::main (this=0xffffa00040c72e00)
>> at core/sched.cc:1267
>> #26 sched::thread_main_c (t=0xffffa00040c72e00) at
>> arch/aarch64/arch-switch.hh:162
>> #27 0x0000000fc017d788 in thread_main () at arch/aarch64/entry.S:116
>> (gdb) frame 11
>> #11 0x0000000fc011ca5c in ifa_ifwithnet (addr=addr@entry=0xffff800040f66840,
>> ignore_ptp=0) at bsd/sys/net/if.cc:1434
>> 1434                            if (ifa->ifa_addr->sa_family != af)
>> (gdb) p ifa->ifa_addr
>> $1 = (bsd_sockaddr *) 0xffff80004088e800
>> (gdb) p ifa->ifa_addr->sa_family
>> $2 = 232 '\350'
>> (gdb) p af
>> $3 = 2
>>
>> As you can see the fault happened in line 1434 of if.cc:
>> *1434                         if (ifa->ifa_addr->sa_family != af)*
>> 1435 next:                           continue;
>> when dereferencing the field ifa_addr of the ifa pointer.
>>
>> In this case, gdb actually showed *ifa_addr* with a valid non-zero
>> address (which is interesting) but most of the time when I connected I
>> would see *ifa_addr* equal to 0.
>>
>> Sometimes this test would "hang", which I discovered was a tcp server
>> thread left listening on a socket after tcp client thread terminated when
>> its loop broke due to an error after calling connect() on a socket:
>>
>> for (int i=0; i < ITERATIONS; i++) {
>> ...
>>              dbg_d("client: connecting to 127.0.0.1...");
>>              if ( connect(s, (struct sockaddr *)&raddr, sizeof(raddr)) <
>> 0 ) {
>>                 dbg_d("client: connect() failed!");
>>                  return -1;
>>              }
>> }
>>
>> Initially, I thought that these "hangs" or crashes were not really
>> specific to TCP stack, but instead just random manifestations of some bug
>> in my patch fixing issue 1087. The patch was about changing preboot.S,
>> boot.S, and core/mmu.cc to move aarch64 kernel code to a different much
>> higher place in virtual memory so that the same loader.elf/kernel.elf could
>> run correctly on both QEMU and firecracker. I speculated that somehow some
>> of virt/phys or phys/virt translation routines have bugs, then possibly
>> something wrong in the new assembly code updating early boot mapping page
>> tables, or finally some dirty cache lines and some missing *isb* or *dsb*
>> instruction. Nothing was helping and at some point, I found that strangely
>> adding a nop instruction to premain() in loader.cc shifting the address of
>> premain would somehow make this crash go away (or most likely I did not see
>> it as often). Then I started debugging the TCP stack and code around the
>> crash and "hang" to understand why exactly this was misbehaving. It had to
>> do with some memory mapping bug I thought.
>>
>> To make this long story short I eventually took a step back and tried to
>> reproduce the exact same crashes/"hangs" on the original version of the
>> code before the patch as it is on master now. Also during the research, I
>> realized that the other 3 tst-bsd-tcp1-*cc tests were failing in a similar
>> way.
>>
>> tst-bsd-tcp1.cc
>> tst-bsd-tcp1-zrcv.cc
>> tst-bsd-tcp1-zsnd.cc
>>
>> All of these tests exercise basic socket API fundamentals where one
>> thread acts as a TCP server listening on some port and another thread
>> acting a client in a loop of 400 iterations connecting to the server,
>> sending some data and reading it back as an echo, and closing the socket.
>> The 3 -z* ones exercise the zero-copy API to send and/or receive data.
>>
>> After I increased the number of loops to something much higher like 100K
>> I started seeing the exact same errors after 5-10K of iterations with the
>> 0-address faults being more dominant. So clearly the issue was not caused
>> by my kernel memory layout changes but due to some old bug and somehow my
>> patch changes made these more frequent. I also tried to reproduce it on
>> x86_64 with the same number of iterations but to no avail.
>>
>> Now a little by luck and little by deduction, I managed to realize that
>> the page fault might be caused by something writing over/erasing the memory
>> (stack overflow?) where the ifa_addr field of the *bsd_ifaddr* struct
>> (ifa variable in the example above). Here is the struct defined in
>> the bsd/sys/net/if_var.h (it will be quite important in this long email
>> below):
>>
>> /*
>>  * The bsd_ifaddr structure contains information about one address
>>  * of an interface.  They are maintained by the different address
>> families,
>>  * are allocated and attached when an address is set, and are linked
>>  * together so all addresses for an interface can be located.
>>  *
>>  * NOTE: a 'struct bsd_ifaddr' is always at the beginning of a larger
>>  * chunk of malloc'ed memory, where we store the three addresses
>>  * (ifa_addr, ifa_dstaddr and ifa_netmask) referenced here.
>>  */
>> struct bsd_ifaddr {
>>         struct  bsd_sockaddr *ifa_addr; /* address of interface */
>>         struct  bsd_sockaddr *ifa_dstaddr;      /* other end of p-to-p
>> link */
>> #define ifa_broadaddr   ifa_dstaddr     /* broadcast address interface */
>>         struct  bsd_sockaddr *ifa_netmask;      /* used to determine
>> subnet */
>>         struct  if_data if_data;        /* not all members are meaningful
>> */
>>         struct  ifnet *ifa_ifp;         /* back-pointer to interface */
>>         TAILQ_ENTRY(bsd_ifaddr) ifa_link;       /* queue macro glue */
>>         void    (*ifa_rtrequest)        /* check or clean routes (+ or
>> -)'d */
>>                 (int, struct rtentry *, struct rt_addrinfo *);
>>         u_short ifa_flags;              /* mostly rt_flags for cloning */
>>         *u_int   ifa_refcnt;*             /* references to this
>> structure */
>>         int     ifa_metric;             /* cost of going out this
>> interface */
>>         int (*ifa_claim_addr)           /* check if an addr goes to this
>> if */
>>                 (struct bsd_ifaddr *, struct bsd_sockaddr *);
>>         struct mtx ifa_mtx;
>> };
>>
>> Obviously, finding what possibly erases this field would be like looking
>> for a needle in the stack of hay. Then I theorized, that the memory
>> allocated to hold the bsd_ifaddr may be freed somehow and used again for
>> the next malloc and end up being written to by something completely else.
>>
>> To that end, I captured the address of the bsd_ifaddr (for example
>> 0xffffa00040882800) and then put a breakpoint or printf in
>> core/mempool.cc:free() to check if something frees that address. And indeed
>> before the crash or hang it would be caught in this check. Eventually, I
>> managed to capture what calls the free():
>>
>> #9  0x0000000fc0243e10 in memory::pool::free (this=0xffffa00040e94800,
>> object=object@entry=0xffff80004088e800) at core/mempool.cc:334
>> #10 0x0000000fc0244010 in free (object=0xffff80004088e800) at
>> core/mempool.cc:1897
>> #11 free (object=object@entry=0xffffa0004088e800) at core/mempool.cc:1871
>> #12 0x0000000fc011c568 in *ifa_free* (ifa=0xffffa0004088e800) at
>> bsd/sys/net/if.cc:1204
>> #13 ifa_free (ifa=ifa@entry=0xffffa0004088e800) at bsd/sys/net/if.cc:1200
>> #14 0x0000000fc012c328 in ip_preprocess_packet (m=<optimized out>,
>> protocol=@0xffff800040a47f5b: 192 '\300', hlen=@0xffff800040a47f5c: 20)
>>     at bsd/sys/netinet/ip_input.cc:717
>> #15 0x0000000fc012c734 in ip_input (m=<optimized out>) at
>> bsd/sys/netinet/ip_input.cc:770
>> #16 0x0000000fc0117c24 in netisr_process_workstream_proto (proto=1,
>> nwsp=0xfc0671f20 <main_nws>) at bsd/sys/net/netisr.cc:576
>> #17 swi_net (arg=0xfc0671f20 <main_nws>) at bsd/sys/net/netisr.cc:615
>> #18 0x0000000fc0118f9c in netisr_osv_thread_wrapper (handler=0xfc0117b10
>> <swi_net(void*)>, arg=0xfc0671f20 <main_nws>) at bsd/sys/net/netisr1.cc:29
>> #19 0x0000000fc0119010 in operator() (__closure=<optimized out>,
>> __closure=<optimized out>) at bsd/sys/net/netisr1.cc:39
>> #20 std::__invoke_impl<void,
>> netisr_osv_start_thread(netisr_osv_handler_t, void*)::<lambda()>&>
>> (__f=...) at /usr/include/c++/10/bits/invoke.h:60
>> #21 std::__invoke_r<void, netisr_osv_start_thread(netisr_osv_handler_t,
>> void*)::<lambda()>&> (__fn=...) at /usr/include/c++/10/bits/invoke.h:153
>> #22 std::_Function_handler<void(),
>> netisr_osv_start_thread(netisr_osv_handler_t, void*)::<lambda()>
>> >::_M_invoke(const std::_Any_data &) (__functor=...)
>>     at /usr/include/c++/10/bits/std_function.h:291
>> #23 0x0000000fc0257444 in sched::thread::main (this=0xffffa00040860200)
>> at core/sched.cc:1267
>> #24 sched::thread_main_c (t=0xffffa00040860200) at
>> arch/aarch64/arch-switch.hh:162
>> #25 0x0000000fc017d788 in thread_main () at arch/aarch64/entry.S:116
>>
>> And here is where the most critical part of freeBSD networking code in
>> bsd/sys/net/if.cc comes:
>>
>> 1180 /*
>> 1181  * Reference count functions for ifaddrs.
>> 1182  */
>> 1183 void
>> 1184 *ifa_init*(struct bsd_ifaddr *ifa)
>> 1185 {
>> 1186
>> 1187         mtx_init(&ifa->ifa_mtx, "bsd_ifaddr", NULL, MTX_DEF);
>> 1188         refcount_init(&ifa->ifa_refcnt, 1);
>> 1189 }
>> 1190
>> 1191 void
>> 1192 *ifa_ref*(struct bsd_ifaddr *ifa)
>> 1193 {
>> 1194
>> 1195         refcount_acquire(&ifa->ifa_refcnt);
>> 1196 }
>> 1197
>> 1198 void
>> 1199 *ifa_free*(struct bsd_ifaddr *ifa)
>> 1200 {
>> 1201
>> 1202         if (refcount_release(&ifa->ifa_refcnt)) {
>> 1203                 mtx_destroy(&ifa->ifa_mtx);
>> 1204                 free(ifa);
>> 1205         }
>> 1206 }
>>
>> It seems that *ifa_free*() calls mtx_destroy() and free() when the
>> *ifa_refcnt* field of bsd_ifaddr is 0 after refcount_release() is
>> called. See code of refcount_release in bsd/sys/sys/refcount.h:
>>
>> static __inline int
>> refcount_release(volatile u_int *count)
>> {
>>         u_int old;
>>
>>         /* XXX: Should this have a rel membar? */
>>         old = atomic_fetchadd_int(count, -1);
>>         KASSERT(old > 0, ("negative refcount %p", count));
>>         return (old == 1);
>> }
>>
>> BTW the XXX comment made me a little bit suspicious. But before we go
>> there let me try to explain what I figured out about the ifa_refcnt field.
>> The comment from the bsd_ifaddr definition states this:
>>
>>   *u_int   ifa_refcnt;*             /* references to this structure */
>>
>> So it seems to be some sort of reference count field. But what purpose
>> does it serve? The simple grep of *ifa_refcnt *to see where it is used
>> gives this list:
>>
>> ./bsd/sys/net/if_var.h:        u_int        ifa_refcnt;                /*
>> references to this structure */
>> ./bsd/sys/net/if.cc:        refcount_init(&ifa->ifa_refcnt, 1);
>> ./bsd/sys/net/if.cc:        refcount_acquire(&ifa->ifa_refcnt);
>> ./bsd/sys/net/if.cc:        if (refcount_release(&ifa->ifa_refcnt)) {
>>
>> which in essence are the 3 functions from bsd/sys/net/if.cc I put couple
>> of paragraphs above - ifa_init(), ifa_ref() and ifa_free(). I do not know
>> for sure of the intention behind ifa_refcnt (I did study some FreeBSD
>> commits) but based on its use by these 3 functions, it seems to act as a
>> reference count to implement some sort of garbage collection mechanism
>> around *bsd_ifaddr* structures to make sure that it is eventually
>> getting free()-ed after no longer used/referenced. From what I figured the
>> ifa_init() is called during this struct allocation and initialization and
>> there are two places in bsd subtree:
>>
>> - if_attach_internal() function in bsd/sys/net/if.cc
>> - in_control() when the SIOCSIFADDR, SIOCSIFNETMASK or SIOCSIFDSTADDR cmd
>> passed in bsd/sys/netinet/in.cc.
>>
>> In essence, the ifa_init() sets the ifa_mtx field and sets ifa_refcnt of
>> the *bsd_ifaddr *to 1.
>> It turns out that the ifa_ref() is then called in 33 places and
>> ifa_free() is called in 55 places in the bsd subtree. The former increments
>> and the latter decrements the ifa_refcnt counter.
>>
>> So I started studying any changes to how these functions may have changed
>> on OSv side of bsd tree and I have found this commit:
>>
>>
>> https://github.com/cloudius-systems/osv/commit/8ad9f53c5dbcacf5632ef878ab9cf93b0d860b10
>> - which is backport of the commit on freebsd side to remove some redundant
>> (?) occurrences of ifa_ref() and ifa_free() from
>> bsd/sys/netinet/ip_output.cc
>> <https://github.com/cloudius-systems/osv/commit/8ad9f53c5dbcacf5632ef878ab9cf93b0d860b10#diff-172218361051c20090e57b94b614925569c4eb942a49c136923d4251dcb72f53>
>> ,
>>
>> Then, I found this patch in FreeBSD codebase around refcount_release -
>> https://github.com/freebsd/freebsd-src/commit/157d5e6df35748cdb589f743bfeccc04bf9af93e
>> - which changes refcount_release() to look like this:
>>
>> static __inline int
>> refcount_release(volatile u_int *count)
>> {
>>         u_int old;
>>
>>         /* XXX: Should this have a rel membar? */
>>         atomic_thread_fence_rel();
>>         old = atomic_fetchadd_int(count, -1);
>>         KASSERT(old > 0, ("negative refcount %p", count));
>>         return (old == 1);
>>         if (old > 1)
>>                 return (0);
>>
>>         /*
>>          * Last reference.  Signal the user to call the destructor.
>>          *
>>          * Ensure that the destructor sees all updates.  The fence_rel
>>          * at the start of the function synchronized with this fence.
>>          */
>>         atomic_thread_fence_acq();
>>         return (1);
>> }
>>
>> And then some other changes adding and removing  ifa_ref() and ifa_free():
>>
>>    -
>>    
>> https://github.com/freebsd/freebsd-src/commit/1ebec5faf41f65a793b4f4217d7a67125c877510
>>    - fixes missed refs
>>    -
>>    
>> https://github.com/freebsd/freebsd-src/commit/2a4bd982d0530863c7fd2aa1ff2b8de27b8896c7
>>    - introduces new macro NET_EPOCH_CALL in some related places
>>
>> On top of that the ifa_ref() and ifa_free() have been changed and
>> enhanced with new related functions and look like this in the latest
>> version (see
>> https://github.com/freebsd/freebsd-src/blob/main/sys/net/if.c#L1759-L1795
>> ):
>>
>> void
>> ifa_ref(struct ifaddr *ifa)
>> {
>>         u_int old __diagused;
>>
>>         old = refcount_acquire(&ifa->ifa_refcnt);
>>         KASSERT(old > 0, ("%s: ifa %p has 0 refs", __func__, ifa));
>> }
>>
>> int
>> ifa_try_ref(struct ifaddr *ifa)
>> {
>>
>>         NET_EPOCH_ASSERT();
>>         return (refcount_acquire_if_not_zero(&ifa->ifa_refcnt));
>> }
>>
>> static void
>> ifa_destroy(epoch_context_t ctx)
>> {
>>         struct ifaddr *ifa;
>>
>>         ifa = __containerof(ctx, struct ifaddr, ifa_epoch_ctx);
>>         counter_u64_free(ifa->ifa_opackets);
>>         counter_u64_free(ifa->ifa_ipackets);
>>         counter_u64_free(ifa->ifa_obytes);
>>         counter_u64_free(ifa->ifa_ibytes);
>>         free(ifa, M_IFADDR);
>> }
>>
>> void
>> ifa_free(struct ifaddr *ifa)
>> {
>>
>>         if (refcount_release(&ifa->ifa_refcnt))
>>                 NET_EPOCH_CALL(ifa_destroy, &ifa->ifa_epoch_ctx);
>> }
>>
>> The struct ifaddr has changed as well.
>>
>> Why am I listing all these changes? Well, my theory (I think more than
>> that) is that there is some old bug in the reference counting logic around
>> ifa_refcnt in the bsd substree - either un-balanced (too many, too few)
>> number of calls to ifa_ref/ifa_free or buggy refcount_release() function
>> (see TODO comment) or combination of those that somehow only shows its ugly
>> face on aarch64. Or possibly there is some race condition that makes calls
>> to ifa_free() be executed before ifa_ref() even if they are balanced. *Please
>> note that this bug happens on 1 or more CPUs so it is not SMP-specific by
>> any means.*
>>
>> If that is the case then ideally we would want to fix this bug by
>> back-porting all these commits and possibly many more from the current
>> FreeBSD TCP code. And I have actually tried to do that - 1st by backporting
>> the refcount_release() patch which did not help. Then apply some extra
>> changes to add/remove some calls to ifa_ref() and ifa_free(). This led to
>> some new crashes now even on x86_64. All of this has made me think that
>> bringing back all necessary changes from FreeBSD tree to fix this bug would
>> be a very tedious and risky effort. We might make the situation even worse.
>>
>> So instead I realized there is a simpler solution to this problem -
>> eliminate buggy reference counting code like this:
>>
>> diff --git a/bsd/sys/net/if.cc b/bsd/sys/net/if.cc
>> index f717da37..fb8f604c 100644
>> --- a/bsd/sys/net/if.cc
>> +++ b/bsd/sys/net/if.cc
>> @@ -1192,17 +1192,17 @@ void
>>  ifa_ref(struct bsd_ifaddr *ifa)
>>  {
>>
>> -        refcount_acquire(&ifa->ifa_refcnt);
>> +        //refcount_acquire(&ifa->ifa_refcnt);
>>  }
>>
>>  void
>>  ifa_free(struct bsd_ifaddr *ifa)
>>  {
>>
>> -        if (refcount_release(&ifa->ifa_refcnt)) {
>> -                mtx_destroy(&ifa->ifa_mtx);
>> -                free(ifa);
>> -        }
>> +        //if (refcount_release(&ifa->ifa_refcnt)) {
>> +        //        mtx_destroy(&ifa->ifa_mtx);
>> +        //        free(ifa);
>> +        //}
>>  }
>>
>> The  bsd_ifaddr is allocated only a handful of times (=5) and destroyed
>> once during OSv runtime. Here is the list of places that I captured from
>> gdb:
>>
>> (gdb) hbr bsd/sys/net/if.cc:1187
>> Hardware assisted breakpoint 1 at 0xfc011bddc: bsd/sys/net/if.cc:1187. (2
>> locations)
>> (gdb) c
>> Continuing.
>>
>> Breakpoint 1, if_attach_internal (vmove=0, ifp=0xffffa0004088e400) at
>> bsd/sys/net/if.cc:1187
>> 1187                mtx_init(&ifa->ifa_mtx, "bsd_ifaddr", NULL, MTX_DEF);
>> (gdb) bt
>> #0  if_attach_internal (vmove=0, ifp=0xffffa0004088e400) at
>> bsd/sys/net/if.cc:1187
>> #1  if_attach (ifp=ifp@entry=0xffff90004088f000) at bsd/sys/net/if.cc:519
>> #2  0x0000000fc011a390 in lo_clone_create (ifc=0xfc061d908 <lo_cloner>,
>> unit=0, params=<optimized out>) at bsd/sys/net/if_loop.cc:116
>> #3  0x0000000fc011a0d0 in ifc_simple_create (ifc=0xfc061d908 <lo_cloner>,
>> name=0xffff8000407a2d50 "lo0", len=16, params=0x0) at
>> bsd/sys/net/if_clone.cc:572
>> #4  0x0000000fc0119360 in if_clone_createif (ifc=ifc@entry=0xfc061d908
>> <lo_cloner>, name=name@entry=0xffff8000407a2d50 "lo0", len=len@entry=16,
>>     params=params@entry=0x0) at bsd/sys/net/if_clone.cc:171
>> #5  0x0000000fc0119458 in if_clone_createif (params=0x0, len=16,
>> name=0xffff8000407a2d50 "lo0", ifc=0xfc061d908 <lo_cloner>) at
>> bsd/sys/net/if_clone.cc:523
>> #6  ifc_simple_attach (ifc=0xfc061d908 <lo_cloner>) at
>> bsd/sys/net/if_clone.cc:525
>> #7  0x0000000fc0119908 in if_clone_attach (ifc=ifc@entry=0xfc061d908
>> <lo_cloner>) at bsd/sys/net/if_clone.cc:306
>> #8  0x0000000fc011a420 in vnet_loif_init () at bsd/sys/net/if_loop.cc:125
>> #9  0x0000000fc00ec2dc in net_init () at bsd/net.cc:64
>> #10 0x0000000fc00dc99c in main_cont (loader_argc=<optimized out>,
>> loader_argv=<optimized out>) at loader.cc:729
>> #11 0x0000000fc0257444 in sched::thread::main (this=0xfc0663e28) at
>> core/sched.cc:1267
>> #12 sched::thread_main_c (t=0xfc0663e28) at
>> arch/aarch64/arch-switch.hh:162
>> #13 0x0000000fc017d788 in thread_main () at arch/aarch64/entry.S:116
>> (gdb) c
>> Continuing.
>>
>> Breakpoint 1, ifa_init (ifa=ifa@entry=0xffffa0004088e800) at
>> bsd/sys/net/if.cc:1187
>> 1187                mtx_init(&ifa->ifa_mtx, "bsd_ifaddr", NULL, MTX_DEF);
>> (gdb) bt
>> #0  ifa_init (ifa=ifa@entry=0xffffa0004088e800) at bsd/sys/net/if.cc:1187
>> #1  0x0000000fc01219c8 in in_control (so=so@entry=0x0, cmd=1077963034,
>> data=data@entry=0xffff8000407a2d50 "lo0", ifp=ifp@entry=0xffff90004088f000,
>> td=td@entry=0x0)
>>     at bsd/sys/netinet/in.cc:388
>> #2  0x0000000fc0107554 in osv::start_if (if_name="lo0",
>> ip_addr="127.0.0.1", mask_addr="255.0.0.0") at bsd/porting/networking.cc:113
>> #3  0x0000000fc00ec350 in net_init () at
>> /usr/include/c++/10/bits/char_traits.h:322
>> #4  0x0000000fc00dc99c in main_cont (loader_argc=<optimized out>,
>> loader_argv=<optimized out>) at loader.cc:729
>> #5  0x0000000fc0257444 in sched::thread::main (this=0xfc0663e28) at
>> core/sched.cc:1267
>> #6  sched::thread_main_c (t=0xfc0663e28) at
>> arch/aarch64/arch-switch.hh:162
>> #7  0x0000000fc017d788 in thread_main () at arch/aarch64/entry.S:116
>> (gdb) c
>> Continuing.
>>
>> Breakpoint 1, if_attach_internal (vmove=0, ifp=0xffffa0004088ec00) at
>> bsd/sys/net/if.cc:1187
>> 1187                mtx_init(&ifa->ifa_mtx, "bsd_ifaddr", NULL, MTX_DEF);
>> (gdb) bt
>> #0  if_attach_internal (vmove=0, ifp=0xffffa0004088ec00) at
>> bsd/sys/net/if.cc:1187
>> #1  if_attach (ifp=ifp@entry=0xffff900040961000) at bsd/sys/net/if.cc:519
>> #2  0x0000000fc0110314 in ether_ifattach (ifp=0xffff900040961000,
>> lla=lla@entry=0xffff90004096a248 "RT") at bsd/sys/net/if_ethersubr.cc:902
>> #3  0x0000000fc017790c in virtio::net::net (this=0xffff90004096a000,
>> dev=...) at drivers/virtio-net.cc:304
>> #4  0x0000000fc0177cdc in aligned_new<virtio::net,
>> virtio::virtio_device&> () at /usr/include/c++/10/new:175
>> #5  virtio::net::probe (dev=<optimized out>) at drivers/virtio-net.cc:931
>> #6  0x0000000fc016cfd8 in std::function<hw::hw_driver*
>> (hw::hw_device*)>::operator()(hw::hw_device*) const (__args#0=<optimized
>> out>, this=<optimized out>)
>>     at drivers/driver.cc:39
>> #7  operator() (dev=0xffffa0004096f300, __closure=0x200000100b10) at
>> drivers/driver.cc:40
>> #8  std::__invoke_impl<void,
>> hw::driver_manager::load_all()::<lambda(hw::hw_device*)>&, hw::hw_device*>
>> (__f=...) at /usr/include/c++/10/bits/invoke.h:60
>> #9  std::__invoke_r<void,
>> hw::driver_manager::load_all()::<lambda(hw::hw_device*)>&, hw::hw_device*>
>> (__fn=...) at /usr/include/c++/10/bits/invoke.h:153
>> #10 std::_Function_handler<void(hw::hw_device*),
>> hw::driver_manager::load_all()::<lambda(hw::hw_device*)> >::_M_invoke(const
>> std::_Any_data &, hw::hw_device *&&) (
>>     __functor=..., __args#0=<optimized out>) at
>> /usr/include/c++/10/bits/std_function.h:291
>> #11 0x0000000fc0169658 in std::function<void
>> (hw::hw_device*)>::operator()(hw::hw_device*) const (__args#0=<optimized
>> out>, this=0x200000100b10)
>>     at /usr/include/c++/10/bits/std_function.h:617
>> #12 hw::device_manager::for_each_device(std::function<void
>> (hw::hw_device*)>) (this=this@entry=0xffffa00040854b40, func=...) at
>> drivers/device.cc:67
>> #13 0x0000000fc016cd18 in hw::driver_manager::load_all 
>> (this=this@entry=0xffffa00040854f80)
>> at /usr/include/c++/10/new:175
>> #14 0x0000000fc017b7e8 in arch_init_drivers () at
>> arch/aarch64/arch-setup.cc:240
>> #15 0x0000000fc00dcf7c in do_main_thread (_main_args=0xffffa000406fb7f0)
>> at loader.cc:416
>> #16 0x0000000fc02b8eb4 in operator() (__closure=0xffffa00040861c00) at
>> libc/pthread.cc:116
>> #17 std::__invoke_impl<void, pthread_private::pthread::pthread(void*
>> (*)(void*), void*, sigset_t, const
>> pthread_private::thread_attr*)::<lambda()>&> (__f=...)
>>     at /usr/include/c++/10/bits/invoke.h:60
>> #18 std::__invoke_r<void, pthread_private::pthread::pthread(void*
>> (*)(void*), void*, sigset_t, const
>> pthread_private::thread_attr*)::<lambda()>&> (__fn=...)
>>     at /usr/include/c++/10/bits/invoke.h:153
>> #19 std::_Function_handler<void(),
>> pthread_private::pthread::pthread(void* (*)(void*), void*, sigset_t, const
>> pthread_private::thread_attr*)::<lambda()> >::_M_invoke(const
>> std::_Any_data &) (__functor=...) at
>> /usr/include/c++/10/bits/std_function.h:291
>> #20 0x0000000fc0257444 in sched::thread::main (this=0xffffa00040860c00)
>> at core/sched.cc:1267
>> #21 sched::thread_main_c (t=0xffffa00040860c00) at
>> arch/aarch64/arch-switch.hh:162
>> #22 0x0000000fc017d788 in thread_main () at arch/aarch64/entry.S:116
>> (gdb) c
>> Continuing.
>>
>> Breakpoint 1, ifa_init (ifa=ifa@entry=0xffffa00040b0fc00) at
>> bsd/sys/net/if.cc:1187
>> 1187                mtx_init(&ifa->ifa_mtx, "bsd_ifaddr", NULL, MTX_DEF);
>> (gdb) bt
>> #0  ifa_init (ifa=ifa@entry=0xffffa00040b0fc00) at bsd/sys/net/if.cc:1187
>> #1  0x0000000fc01219c8 in in_control (so=so@entry=0x0, cmd=1077963034,
>> data=data@entry=0x200000100a70 "eth0", ifp=ifp@entry=0xffff900040961000,
>> td=td@entry=0x0)
>>     at bsd/sys/netinet/in.cc:388
>> #2  0x0000000fc0107554 in osv::start_if (if_name="eth0",
>> ip_addr="0.0.0.0", mask_addr="255.255.255.0") at
>> bsd/porting/networking.cc:113
>> #3  0x0000000fc00da234 in operator() (if_name="eth0",
>> __closure=0x200000100db0) at /usr/include/c++/10/ext/new_allocator.h:79
>> #4  std::__invoke_impl<void,
>> do_main_thread(void*)::<lambda(std::string)>&,
>> std::__cxx11::basic_string<char, std::char_traits<char>,
>> std::allocator<char> > > (__f=...)
>>     at /usr/include/c++/10/bits/invoke.h:60
>> #5  std::__invoke_r<void, do_main_thread(void*)::<lambda(std::string)>&,
>> std::__cxx11::basic_string<char, std::char_traits<char>,
>> std::allocator<char> > > (__fn=...)
>>     at /usr/include/c++/10/bits/invoke.h:153
>> #6  std::_Function_handler<void(std::__cxx11::basic_string<char,
>> std::char_traits<char>, std::allocator<char> >),
>> do_main_thread(void*)::<lambda(std::string)> >::_M_invoke(const
>> std::_Any_data &, std::__cxx11::basic_string<char, std::char_traits<char>,
>> std::allocator<char> > &&) (__functor=..., __args#0=...)
>>     at /usr/include/c++/10/bits/std_function.h:291
>> #7  0x0000000fc0107274 in std::function<void
>> (std::__cxx11::basic_string<char, std::char_traits<char>,
>> std::allocator<char> >)>::operator()(std::__cxx11::basic_string<char,
>> std::char_traits<char>, std::allocator<char> >) const (__args#0="",
>> this=0x200000100db0) at /usr/include/c++/10/bits/std_function.h:617
>> #8  osv::for_each_if(std::function<void (std::__cxx11::basic_string<char,
>> std::char_traits<char>, std::allocator<char> >)>) (func=...) at
>> bsd/porting/networking.cc:32
>> #9  0x0000000fc00dcfd4 in do_main_thread (_main_args=0xffffa000406fb7f0)
>> at /usr/include/c++/10/new:175
>> #10 0x0000000fc02b8eb4 in operator() (__closure=0xffffa00040861c00) at
>> libc/pthread.cc:116
>> #11 std::__invoke_impl<void, pthread_private::pthread::pthread(void*
>> (*)(void*), void*, sigset_t, const
>> pthread_private::thread_attr*)::<lambda()>&> (__f=...)
>>     at /usr/include/c++/10/bits/invoke.h:60
>> #12 std::__invoke_r<void, pthread_private::pthread::pthread(void*
>> (*)(void*), void*, sigset_t, const
>> pthread_private::thread_attr*)::<lambda()>&> (__fn=...)
>>     at /usr/include/c++/10/bits/invoke.h:153
>> #13 std::_Function_handler<void(),
>> pthread_private::pthread::pthread(void* (*)(void*), void*, sigset_t, const
>> pthread_private::thread_attr*)::<lambda()> >::_M_invoke(const
>> std::_Any_data &) (__functor=...) at
>> /usr/include/c++/10/bits/std_function.h:291
>> #14 0x0000000fc0257444 in sched::thread::main (this=0xffffa00040860c00)
>> at core/sched.cc:1267
>> #15 sched::thread_main_c (t=0xffffa00040860c00) at
>> arch/aarch64/arch-switch.hh:162
>> #16 0x0000000fc017d788 in thread_main () at arch/aarch64/entry.S:116
>> (gdb) c
>> Continuing.
>>
>> Breakpoint 1, ifa_init (ifa=ifa@entry=0xffffa00040b0fc00) at
>> bsd/sys/net/if.cc:1187
>> 1187                mtx_init(&ifa->ifa_mtx, "bsd_ifaddr", NULL, MTX_DEF);
>> (gdb) bt
>> #0  ifa_init (ifa=ifa@entry=0xffffa00040b0fc00) at bsd/sys/net/if.cc:1187
>> #1  0x0000000fc01219c8 in in_control (so=so@entry=0x0, cmd=1077963034,
>> data=data@entry=0x200000100a70 "eth0", ifp=ifp@entry=0xffff900040961000,
>> td=td@entry=0x0)
>>     at bsd/sys/netinet/in.cc:388
>> #2  0x0000000fc0107554 in osv::start_if (if_name="eth0",
>> ip_addr="0.0.0.0", mask_addr="255.255.255.0") at
>> bsd/porting/networking.cc:113
>> #3  0x0000000fc00da234 in operator() (if_name="eth0",
>> __closure=0x200000100db0) at /usr/include/c++/10/ext/new_allocator.h:79
>> #4  std::__invoke_impl<void,
>> do_main_thread(void*)::<lambda(std::string)>&,
>> std::__cxx11::basic_string<char, std::char_traits<char>,
>> std::allocator<char> > > (__f=...)
>>     at /usr/include/c++/10/bits/invoke.h:60
>> #5  std::__invoke_r<void, do_main_thread(void*)::<lambda(std::string)>&,
>> std::__cxx11::basic_string<char, std::char_traits<char>,
>> std::allocator<char> > > (__fn=...)
>>     at /usr/include/c++/10/bits/invoke.h:153
>> #6  std::_Function_handler<void(std::__cxx11::basic_string<char,
>> std::char_traits<char>, std::allocator<char> >),
>> do_main_thread(void*)::<lambda(std::string)> >::_M_invoke(const
>> std::_Any_data &, std::__cxx11::basic_string<char, std::char_traits<char>,
>> std::allocator<char> > &&) (__functor=..., __args#0=...)
>>     at /usr/include/c++/10/bits/std_function.h:291
>> #7  0x0000000fc0107274 in std::function<void
>> (std::__cxx11::basic_string<char, std::char_traits<char>,
>> std::allocator<char> >)>::operator()(std::__cxx11::basic_string<char,
>> std::char_traits<char>, std::allocator<char> >) const (__args#0="",
>> this=0x200000100db0) at /usr/include/c++/10/bits/std_function.h:617
>> #8  osv::for_each_if(std::function<void (std::__cxx11::basic_string<char,
>> std::char_traits<char>, std::allocator<char> >)>) (func=...) at
>> bsd/porting/networking.cc:32
>> #9  0x0000000fc00dcfd4 in do_main_thread (_main_args=0xffffa000406fb7f0)
>> at /usr/include/c++/10/new:175
>> #10 0x0000000fc02b8eb4 in operator() (__closure=0xffffa00040861c00) at
>> libc/pthread.cc:116
>> #11 std::__invoke_impl<void, pthread_private::pthread::pthread(void*
>> (*)(void*), void*, sigset_t, const
>> pthread_private::thread_attr*)::<lambda()>&> (__f=...)
>>     at /usr/include/c++/10/bits/invoke.h:60
>> #12 std::__invoke_r<void, pthread_private::pthread::pthread(void*
>> (*)(void*), void*, sigset_t, const
>> pthread_private::thread_attr*)::<lambda()>&> (__fn=...)
>>     at /usr/include/c++/10/bits/invoke.h:153
>> #13 std::_Function_handler<void(),
>> pthread_private::pthread::pthread(void* (*)(void*), void*, sigset_t, const
>> pthread_private::thread_attr*)::<lambda()> >::_M_invoke(const
>> std::_Any_data &) (__functor=...) at
>> /usr/include/c++/10/bits/std_function.h:291
>> #14 0x0000000fc0257444 in sched::thread::main (this=0xffffa00040860c00)
>> at core/sched.cc:1267
>> #15 sched::thread_main_c (t=0xffffa00040860c00) at
>> arch/aarch64/arch-switch.hh:162
>> #16 0x0000000fc017d788 in thread_main () at arch/aarch64/entry.S:116
>> (gdb) c
>> Continuing.
>>
>> Breakpoint 1, ifa_init (ifa=ifa@entry=0xffffa00040960400) at
>> bsd/sys/net/if.cc:1187
>> 1187                mtx_init(&ifa->ifa_mtx, "bsd_ifaddr", NULL, MTX_DEF);
>> (gdb) bt
>> #0  ifa_init (ifa=ifa@entry=0xffffa00040960400) at bsd/sys/net/if.cc:1187
>> #1  0x0000000fc01219c8 in in_control (so=so@entry=0x0, cmd=1077963034,
>> data=data@entry=0xffff800040d52d90 "eth0", ifp=ifp@entry=0xffff900040961000,
>> td=td@entry=0x0)
>>     at bsd/sys/netinet/in.cc:388
>> #2  0x0000000fc0107554 in osv::start_if (if_name="eth0",
>> ip_addr="192.168.122.15", mask_addr="255.255.255.0") at
>> bsd/porting/networking.cc:113
>> #3  0x0000000fc027b05c in dhcp::dhcp_interface_state::state_request
>> (this=this@entry=0xffffa0004096bc40, dm=...) at
>> /usr/include/c++/10/bits/char_traits.h:322
>> #4  0x0000000fc027c3d0 in dhcp::dhcp_interface_state::process_packet
>> (this=0xffffa0004096bc40, m=m@entry=0xffffa00040c7c100) at
>> core/dhcp.cc:633
>> #5  0x0000000fc027c540 in dhcp::dhcp_worker::dhcp_worker_fn
>> (this=0xfc06aab30 <net_dhcp_worker>) at core/dhcp.cc:843
>> #6  0x0000000fc027c5e0 in operator() (__closure=<optimized out>,
>> __closure=<optimized out>) at core/dhcp.cc:773
>> #7  std::__invoke_impl<void, dhcp::dhcp_worker::init()::<lambda()>&>
>> (__f=...) at /usr/include/c++/10/bits/invoke.h:60
>> #8  std::__invoke_r<void, dhcp::dhcp_worker::init()::<lambda()>&>
>> (__fn=...) at /usr/include/c++/10/bits/invoke.h:153
>> #9  std::_Function_handler<void(), dhcp::dhcp_worker::init()::<lambda()>
>> >::_M_invoke(const std::_Any_data &) (__functor=...)
>>     at /usr/include/c++/10/bits/std_function.h:291
>> #10 0x0000000fc0257444 in sched::thread::main (this=0xffffa00040b08800)
>> at core/sched.cc:1267
>> #11 sched::thread_main_c (t=0xffffa00040b08800) at
>> arch/aarch64/arch-switch.hh:162
>> #12 0x0000000fc017d788 in thread_main () at arch/aarch64/entry.S:116
>>
>> Ideally, as a "good citizen" we would want to free what we allocated but
>> given that even now these 5 calls to allocate bsd_ifaddr (mostly from
>> osv::start_if()) are not balanced with corresponding ones to free
>> bsd_ifaddr, possibly because we lack all necessary calls to osv::stop_if()
>> (only one in core/dhcp.cc). But given there is only one occurrence of
>> freeing bsd_ifaddr(), disabling reference counting logic would only lead to
>> a single leak of memory (roughly 1024 bytes). These structures are
>> long-lived and map to a handful of IPs/interfaces (lo, eth0, ...) OSv would
>> have.
>>
>> There is also extra benefit: the refcount_release()/refcount_acquire()
>> functions called by ifa_free()/ifa_ref() are not free - they operate on
>> atomics and probably lead to some contention in SMP case. I have not made
>> any measurements but I would assume the tcp stack could become a little bit
>> faster by not using atomics at all (there are 33 calls to ifa_ref() and 55
>> to ifa_free() in bsd subtree). For example, the refcount_acquire() calls
>> this inline assembly:
>>
>> static __inline u_int atomic_fetchadd_int(volatile u_int *p, u_int val)
>> {
>>     u_int result;
>>     u_int status;
>>     __asm __volatile("1: ldaxr %w0, %1 ; "
>>                      "   add   %w2, %w2, %w0 ; "
>>                      "   stlxr %w3, %w2, %1 ; "
>>                      "   cbnz  %w3, 1b ; "
>>                      : "=&r"(result), "+Q"(*p), "+r"(val), "=&r"(status));
>>
>>     return result;
>> }
>>
>> This is actually a loop likely to happen in SMP contentious cases. See
>> this article - https://cpufun.substack.com/p/atomics-in-aarch64?s=r.
>>
>> After applying the patch (see above) to disable ifa_refcnt counting
>> (ifa_ref, ifa_free do nothing) and running all 4 tst-bsd-tcp-* tests with
>> 100K iterations on 1, 2, and 4 CPUs on both aarch64 and x86_64 I could not
>> reproduce these crashes and "hang" scenarios.
>>
>> What do you think about my idea?
>>
>> Waldek
>>
>> --
> You received this message because you are subscribed to the Google Groups
> "OSv Development" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/osv-dev/8a8cb526-6d52-410a-81f3-91169dd1d6c3n%40googlegroups.com
> <https://groups.google.com/d/msgid/osv-dev/8a8cb526-6d52-410a-81f3-91169dd1d6c3n%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>

-- 
You received this message because you are subscribed to the Google Groups "OSv 
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/osv-dev/CANEVyjs3iEPaf7Yqc01Kq%2BcqTwticPijt6eXKo2yPVSvfcUarg%40mail.gmail.com.

Reply via email to