Hello David,

On 21/12/22(Wed) 11:37, David Hill wrote:
> On 12/21/22 11:23, Martin Pieuchot wrote:
> > On 21/12/22(Wed) 09:20, David Hill wrote:
> > > On 12/21/22 07:08, David Hill wrote:
> > > > On 12/21/22 05:33, Martin Pieuchot wrote:
> > > > > On 18/12/22(Sun) 20:55, Martin Pieuchot wrote:
> > > > > > On 17/12/22(Sat) 14:15, David Hill wrote:
> > > > > > > On 10/28/22 03:46, Renato Aguiar wrote:
> > > > > > > > Use of bbolt Go library causes 7.2 to freeze. I suspect
> > > > > > > > it is triggering some
> > > > > > > > sort of deadlock in mmap because threads get stuck at vmmaplk.
> > > > > > > > 
> > > > > > > > I managed to reproduce it consistently in a laptop with
> > > > > > > > 4 cores (i5-1135G7)
> > > > > > > > using one unit test from bbolt:
> > > > > > > > 
> > > > > > > >      $ doas pkg_add git go
> > > > > > > >      $ git clone https://github.com/etcd-io/bbolt.git
> > > > > > > >      $ cd bbolt
> > > > > > > >      $ git checkout v1.3.6
> > > > > > > >      $ go test -v -run TestSimulate_10000op_10p
> > > > > > > > 
> > > > > > > > The test never ends and this is the 'top' report:
> > > > > > > > 
> > > > > > > >      PID      TID PRI NICE  SIZE   RES STATE
> > > > > > > > WAIT      TIME    CPU COMMAND
> > > > > > > > 32181   438138 -18    0   57M   13M idle      uvn_fls
> > > > > > > > 0:00  0.00% bbolt.test
> > > > > > > > 32181   331169  10    0   57M   13M sleep/1   nanoslp
> > > > > > > > 0:00  0.00% bbolt.test
> > > > > > > > 32181   497390  10    0   57M   13M idle      vmmaplk
> > > > > > > > 0:00  0.00% bbolt.test
> > > > > > > > 32181   380477  14    0   57M   13M idle      vmmaplk
> > > > > > > > 0:00  0.00% bbolt.test
> > > > > > > > 32181   336950  14    0   57M   13M idle      vmmaplk
> > > > > > > > 0:00  0.00% bbolt.test
> > > > > > > > 32181   491043  14    0   57M   13M idle      vmmaplk
> > > > > > > > 0:00  0.00% bbolt.test
> > > > > > > > 32181   347071   2    0   57M   13M idle      kqread
> > > > > > > > 0:00  0.00% bbolt.test
> > > > > > > > 
> > > > > > > > After this, most commands just hang. For example,
> > > > > > > > running a 'ps | grep foo' in
> > > > > > > > another shell would do it.
> > > > > > > > 
> > > > > > > 
> > > > > > > I can reproduce this on MP, but not SP.  Here is /trace from
> > > > > > > ddb after using
> > > > > > > the ddb.trigger sysctl.  Is there any other information I
> > > > > > > could pull from
> > > > > > > DDB that may help?
> > > > > > 
> > > > > > Thanks for the useful report David!
> > > > > > 
> > > > > > The issue seems to be a deadlock between the `vmmaplk' and a 
> > > > > > particular
> > > > > > `vmobjlock'.  uvm_map_clean() calls uvn_flush() which sleeps with 
> > > > > > the
> > > > > > `vmmaplk' held.
> > > > > > 
> > > > > > I'll think a bit about this and try to come up with a fix ASAP.
> > > > > 
> > > > > I'm missing a piece of information.  All the threads in your report 
> > > > > seem
> > > > > to want a read version of the `vmmaplk' so they should not block.  
> > > > > Could
> > > > > you reproduce the hang with a WITNESS kernel and print 'show all 
> > > > > locks'
> > > > > in addition to all the informations you've reported?
> > > > > 
> > > > 
> > > > Sure.  Its always the same; 2 processes (sysctl and bbolt.test) and 3
> > > > locks (sysctllk, kernel_lock, and vmmaplk) with bbolt.test always on the
> > > > uvn_flsh thread.
> > > > 
> > > > 
> > > > Process 98301 (sysctl) thread 0xfff......
> > > > exclusive rwlock sysctllk r = 0 (0xfffff...)
> > > > exclusive kernel_lock &kernel_lock r = 0 (0xffffff......)
> > > > Process 32181 (bbolt.test) thread (0xffffff...) (438138)
> > > > shared rwlock vmmaplk r = 0 (0xfffff......)
> > > > 
> > > > To reproduce, just do:
> > > > $ doas pkg_add git go
> > > > $ git clone https://github.com/etcd-io/bbolt.git
> > > > $ cd bbolt
> > > > $ git checkout v1.3.6
> > > > $ go test -v -run TestSimulate_10000op_10p
> > > > 
> > > > The test will hang happen almost instantly.
> > > > 
> > > 
> > > Not sure if this is a hint..
> > > 
> > > https://github.com/etcd-io/bbolt/blob/master/db.go#L27-L31
> > > 
> > > // IgnoreNoSync specifies whether the NoSync field of a DB is ignored when
> > > // syncing changes to a file.  This is required as some operating systems,
> > > // such as OpenBSD, do not have a unified buffer cache (UBC) and writes
> > > // must be synchronized using the msync(2) syscall.
> > > const IgnoreNoSync = runtime.GOOS == "openbsd"
> > 
> > Yes, the issue is related to sync(2).  Could you try the diff below, it
> > is not a fix, and tell me if you can produce the issue with it?  I can't.
> 
> Ran it 20 times and all completed and passed.  I was also able to interrupt
> it as well.   no issues.
> 
> Excellent!

Here's the best fix I could come up with.  We mark the VM map as "busy"
during the page fault just before the faulting thread releases the shared
lock.  This ensures no other thread will grab an exclusive lock until the
fault is finished. 

I couldn't trigger the reproducer with this, can you?

Index: uvm/uvm_fault.c
===================================================================
RCS file: /cvs/src/sys/uvm/uvm_fault.c,v
retrieving revision 1.133
diff -u -p -r1.133 uvm_fault.c
--- uvm/uvm_fault.c     4 Nov 2022 09:36:44 -0000       1.133
+++ uvm/uvm_fault.c     20 Jan 2023 13:52:58 -0000
@@ -1277,6 +1277,9 @@ uvm_fault_lower(struct uvm_faultinfo *uf
                /* update rusage counters */
                curproc->p_ru.ru_majflt++;
 
+               /* prevent siblings to grab an exclusive lock on the map */
+               vm_map_busy(ufi->map);
+
                uvmfault_unlockall(ufi, amap, NULL);
 
                counters_inc(uvmexp_counters, flt_get);
@@ -1293,13 +1296,16 @@ uvm_fault_lower(struct uvm_faultinfo *uf
                        KASSERT(result != VM_PAGER_PEND);
 
                        if (result == VM_PAGER_AGAIN) {
+                               vm_map_unbusy(ufi->map);
                                tsleep_nsec(&nowake, PVM, "fltagain2",
                                    MSEC_TO_NSEC(5));
                                return ERESTART;
                        }
 
-                       if (!UVM_ET_ISNOFAULT(ufi->entry))
+                       if (!UVM_ET_ISNOFAULT(ufi->entry)) {
+                               vm_map_unbusy(ufi->map);
                                return (EIO);
+                       }
 
                        uobjpage = PGO_DONTCARE;
                        uobj = NULL;
@@ -1308,6 +1314,7 @@ uvm_fault_lower(struct uvm_faultinfo *uf
 
                /* re-verify the state of the world.  */
                locked = uvmfault_relock(ufi);
+               vm_map_unbusy(ufi->map);
                if (locked && amap != NULL)
                        amap_lock(amap);
 

Reply via email to