Re: race condition in libports

2018-01-05 Thread Samuel Thibault
Samuel Thibault, on sam. 06 janv. 2018 02:03:11 +0100, wrote:
> Brent W. Baccala, on ven. 05 janv. 2018 19:47:37 -0500, wrote:
> > Well, in that case, perhaps it should work by checking to see if the port 
> > is in
> > the hash, rather than by looking at how many weak references it's got. 
> > Something more like this:
> > 
> >   if ((refcounts_hard_references(>pi.refcounts) == 0)
> >       && (* i->id_hashloc == i))
> >     {
> >       /* Nobody got a send right in between, we can remove from the hash.  
> > */
> >       hurd_ihash_locp_remove (, i->id_hashloc);
> >       ports_port_deref_weak (>pi);
> >     }
> 
> You can't afford dereferencing id_hashloc like that: it could have
> gotten away in between.
> 
> Could you try the attached patch instead?

Actually that's what libdiskfs/node-cache.c does :)

Samuel



Re: race condition in libports

2018-01-05 Thread Samuel Thibault
Hello,

Brent W. Baccala, on ven. 05 janv. 2018 19:47:37 -0500, wrote:
> Well, in that case, perhaps it should work by checking to see if the port is 
> in
> the hash, rather than by looking at how many weak references it's got. 
> Something more like this:
> 
>   if ((refcounts_hard_references(>pi.refcounts) == 0)
>       && (* i->id_hashloc == i))
>     {
>       /* Nobody got a send right in between, we can remove from the hash.  */
>       hurd_ihash_locp_remove (, i->id_hashloc);
>       ports_port_deref_weak (>pi);
>     }

You can't afford dereferencing id_hashloc like that: it could have
gotten away in between.

Could you try the attached patch instead?

> Also, what documentation?  The hurd texinfo file?  That's the only place that 
> I
> know of where this is documented.

Yes: “If nonzero, @var{dropweak_routine} will be called to request
weak references to be dropped.” It doesn't say "one weak reference", but
"weak references".

Samuel
diff --git a/libfshelp/get-identity.c b/libfshelp/get-identity.c
index f88e0f82..ac1b4f0b 100644
--- a/libfshelp/get-identity.c
+++ b/libfshelp/get-identity.c
@@ -70,10 +70,12 @@ id_clean (void *cookie)
 {
   struct idspec *i = cookie;
   pthread_mutex_lock ();
-  if (refcounts_hard_references(>pi.refcounts) == 0)
+  if (refcounts_hard_references(>pi.refcounts) == 0
+  && i->id_hashloc != NULL)
 {
-  /* Nobody got a send right in between, we can remove from the hash.  */
+  /* Nobody got a send right in between, we can remove i from the hash.  */
   hurd_ihash_locp_remove (, i->id_hashloc);
+  i->id_hashloc = NULL;
   ports_port_deref_weak (>pi);
 }
   pthread_mutex_unlock ();


Re: race condition in libports

2018-01-05 Thread Brent W. Baccala
On Fri, Jan 5, 2018 at 7:24 PM, Samuel Thibault 
wrote:

> Hello,
>
> Brent W. Baccala, on ven. 05 janv. 2018 17:45:57 -0500, wrote:
> > I've "fixed" this by making sure we don't remove the hash table entry
> unless
> > there are exactly two weak references outstanding, but I'm not sure
> that's the
> > best way to handle it.  It doesn't seem like the dropweak routine should
> have
> > to be so careful; it shouldn't get called twice like that.
>
> Well, AIUI from the documentation, dropweak should manage by itself how
> many weak references it should drop. I.e. the number of times dropweak
> is called doesn't matter, the first call should drop them, and later
> calls should be fine that there aren't any to drop any more. So it seems
> to me that your proposed fix is correct. I just rewrote it to make sure
> to get coherent counts between hard and weak.


Well, in that case, perhaps it should work by checking to see if the port
is in the hash, rather than by looking at how many weak references it's
got.  Something more like this:

  if ((refcounts_hard_references(>pi.refcounts) == 0)
  && (* i->id_hashloc == i))
{
  /* Nobody got a send right in between, we can remove from the hash.
*/
  hurd_ihash_locp_remove (, i->id_hashloc);
  ports_port_deref_weak (>pi);
}

Also, what documentation?  The hurd texinfo file?  That's the only place
that I know of where this is documented.

agape
brent


Re: race condition in libports

2018-01-05 Thread Samuel Thibault
Hello,

Brent W. Baccala, on ven. 05 janv. 2018 17:45:57 -0500, wrote:
> I've "fixed" this by making sure we don't remove the hash table entry unless
> there are exactly two weak references outstanding, but I'm not sure that's the
> best way to handle it.  It doesn't seem like the dropweak routine should have
> to be so careful; it shouldn't get called twice like that.

Well, AIUI from the documentation, dropweak should manage by itself how
many weak references it should drop. I.e. the number of times dropweak
is called doesn't matter, the first call should drop them, and later
calls should be fine that there aren't any to drop any more. So it seems
to me that your proposed fix is correct. I just rewrote it to make sure
to get coherent counts between hard and weak.

Samuel



Re: race condition in libports

2018-01-05 Thread Samuel Thibault
Thanks for the proposed fix, I reused existing code to commit them!

Samuel



Re: race condition in libports

2018-01-05 Thread Samuel Thibault
Samuel Thibault, on ven. 05 janv. 2018 23:54:49 +0100, wrote:
> Brent W. Baccala, on ven. 05 janv. 2018 17:45:57 -0500, wrote:
> error_t
> fshelp_get_identity (struct port_bucket *bucket,
> .   .    ino_t fileno,
> .   .    mach_port_t *pt)
> 
> >if (!idclass)
> >  id_initialize ();
> >  
> > -  i = hurd_ihash_find (, (hurd_ihash_key_t) fileno);
> > +  i = hurd_ihash_find (, (hurd_ihash_key_t) );
> 
> That can't mean anything

Ah, sorry, I read too fast, didn't notice the hash functions.

Samuel



Re: race condition in libports

2018-01-05 Thread Samuel Thibault
Brent W. Baccala, on ven. 05 janv. 2018 17:45:57 -0500, wrote:
error_t
fshelp_get_identity (struct port_bucket *bucket,
.   .    ino_t fileno,
.   .    mach_port_t *pt)

>if (!idclass)
>  id_initialize ();
>  
> -  i = hurd_ihash_find (, (hurd_ihash_key_t) fileno);
> +  i = hurd_ihash_find (, (hurd_ihash_key_t) );

That can't mean anything: fileno is a parameter, its adresse is
basically random.

Samuel



Re: race condition in libports

2018-01-05 Thread Brent W. Baccala
On Sun, Dec 17, 2017 at 11:41 PM, Brent W. Baccala 
wrote:

> On Sun, Dec 17, 2017 at 8:02 AM, Samuel Thibault 
> wrote:
>
>> Hello,
>>
>> Brent W. Baccala, on sam. 16 déc. 2017 21:37:05 -0500, wrote:
>> > basically, we're storing ports in a inode-to-port hash, looking them
>> > up when io_identity() gets called, and removing them from the hash
>> > when the class's clean routine gets called.
>>
>> That's the bug: one needs a reference for this.  And it's a weak
>> reference: identity is fine with getting rid of it.  Could you try the
>> attached patch?
>
>
> Makes sense.  I've applied it.  So far, so good.
>

I'm still having problems with this, even with the stock ext2fs/libpager
code.  To reproduce these bugs, I suggest making a copy of the virtual disk
(hd0 -> hd1), booting a subhurd on hd1, then trying to build the entire
hurd package (dpkg-buildpackage -b).  I've yet to successfully build the
hurd packages in a subhurd - there's always some bug or another that gets
triggered.  If not this one, then I've been having problems with the proc
server.

Back to the io_identity code...

Part of the problem is that sizeof(ino_t) is 8 bytes, while
sizeof(hurd_ihash_key_t) is only 4 bytes.  So we need to use the
"generalized key interface" to hash the full 8 bytes.

But that doesn't fix everything.  I'm seeing periodic double removal from
the hash table.  It originally manifested as an underflow on the weak
reference count, but I added a line:

   assert_backtrace(hurd_ihash_value_valid(* i->id_hashloc));

right before the hurd_ihash_locp_remove in id_clean, and this assert
periodically triggers.

The most notable thing about it is that it always seems to be triggered by
a NO SENDERS notification with a mscount field less than the mscount in the
portinfo structure:

#8  0x0807ed9d in __assert_fail_backtrace (assertion=0x817c080
"hurd_ihash_value_valid(* i->id_hashloc)",
file=0x817c03c
"/root/hurd-0.9.git20171119/build-deb/../libfshelp/get-identity.c",
line=60,
function=0x817c0a8 <__PRETTY_FUNCTION__.8282> "id_clean") at
./build-deb/../libshouldbeinlibc/assert-backtrace.c:61
#9  0x08073d61 in id_clean (cookie=0x20550ec0) at
./build-deb/../libfshelp/get-identity.c:60
#10 0x0807a20e in ports_port_deref (portstruct=0x20550ec0) at
./build-deb/../libports/port-deref.c:39
#11 0x0807a9bd in internal_demuxer (outheadp=0x22800ef0, inp=0x22802f00) at
./build-deb/../libports/manage-multithread.c:215
#12 synchronized_demuxer (inp=0x22802f00, outheadp=0x22800ef0) at
./build-deb/../libports/manage-multithread.c:239

(gdb) frame 11
#11 0x0807a9bd in internal_demuxer (outheadp=0x22800ef0, inp=0x22802f00) at
./build-deb/../libports/manage-multithread.c:215
215   ports_port_deref (pi);
(gdb) x/8x inp
0x22802f00: 0x1700  0x0020  0x  0x20550ec0
0x22802f10: 0x  0x0046  0x10012002  0x0002

(gdb) print *pi
$9 = {class = 0x209ae8, refcounts = {references = {hard = 0, weak = 1},
value = 4294967296}, mscount = 24,
  cancel_threshold = 0, flags = 0, port_right = 22723, current_rpcs = 0x0,
bucket = 0x820e628, hentry = 0x84a7cf8,
  ports_htable_entry = 0x205b3610}

Notice that the NO SENDER message had an mscount of 2 (the last word in the
x/8x), but the portinfo structure has an mscount of 24.

Also, there's no hard references and one weak reference in the portinfo
structure.  At this point there should be two weak references - one for the
hash table entry and one for the demuxer.  If the hash table assert didn't
fail, then we'd be getting a weak reference underflow soon.

I've puzzled over this for days, and this is my current theory for what's
happening.

We started with one hard reference (because we have outstanding send
rights) and one weak reference (the hash table).  Then we got a NO SENDERS
message, so we started to drop the hard reference by demoting it to a weak
reference, in libports/port-deref.c.  Now we've got no hards, two weaks,
the entry still in the hash table, and nothing locked (we're about to call
the dropweak_routine).

At this point, other threads preempt us.  Our portinfo structure is still
in the hash table, so more send rights get created (22 more, in the gdb
trace above!), they get consumed, and another NO SENDERS message gets
generated.  It gets to the dropweak_routine with no hard references and
three weak references (the two from before, plus one more from the second
NO SENDERS' demuxer).  It removes the hash table entry (dropping one weak
reference), drops another weak reference (in ports_port_deref), and
finishes running.

Now the first thread (finally) gets to run again.  It's now got no hard
references, one weak reference, and the dropweak_routine runs on a portinfo
structure that's already been removed from the hash table.  It removes the
hash table entry (again), drops the last weak reference, and then tries to
drop another weak reference in ports_port_deref, which 

Re: race condition in libports

2017-12-17 Thread Brent W. Baccala
On Sun, Dec 17, 2017 at 8:02 AM, Samuel Thibault 
wrote:

> Hello,
>
> Brent W. Baccala, on sam. 16 déc. 2017 21:37:05 -0500, wrote:
> > basically, we're storing ports in a inode-to-port hash, looking them
> > up when io_identity() gets called, and removing them from the hash
> > when the class's clean routine gets called.
>
> That's the bug: one needs a reference for this.  And it's a weak
> reference: identity is fine with getting rid of it.  Could you try the
> attached patch?


Makes sense.  I've applied it.  So far, so good.

Now I feel stupid!  Holding a port in a hash is such a classic example of a
weak reference, but I didn't even think about moving that code out of the
clean function.

Thank you.

agape
brent


Re: race condition in libports

2017-12-17 Thread Samuel Thibault
Hello,

Brent W. Baccala, on sam. 16 déc. 2017 21:37:05 -0500, wrote:
> basically, we're storing ports in a inode-to-port hash, looking them
> up when io_identity() gets called, and removing them from the hash
> when the class's clean routine gets called.

That's the bug: one needs a reference for this.  And it's a weak
reference: identity is fine with getting rid of it.  Could you try the
attached patch?

Samuel
Index: hurd-debian/libfshelp/get-identity.c
===
--- hurd-debian.orig/libfshelp/get-identity.c
+++ hurd-debian/libfshelp/get-identity.c
@@ -42,7 +43,12 @@ id_clean (void *cookie)
 {
   struct idspec *i = cookie;
   pthread_mutex_lock ();
-  hurd_ihash_locp_remove (, i->id_hashloc);
+  if (refcounts_hard_references(>pi.refcounts) == 0)
+{
+  /* Nobody got a send right in between, we can remove from the hash.  */
+  hurd_ihash_locp_remove (, i->id_hashloc);
+  ports_port_deref_weak (>pi);
+}
   pthread_mutex_unlock ();
 }
 
@@ -50,7 +56,7 @@ static void
 id_initialize ()
 {
   assert_backtrace (!idclass);
-  idclass = ports_create_class (id_clean, NULL);
+  idclass = ports_create_class (NULL, id_clean);
 }
 
 error_t
@@ -75,6 +81,9 @@ fshelp_get_identity (struct port_bucket
   if (err)
 goto lose_port;
 
+  /* Weak reference for the hash entry.  */
+  ports_port_ref_weak(>pi);
+
   *pt = ports_get_right (i);
   ports_port_deref (i);
 }


race condition in libports

2017-12-16 Thread Brent W. Baccala
Hi -

I'm making good progress on the multi-client libpager.  I've been running
it on my root filesystem for about a month now, with few problems recently.

However, there are still some bugs.  One seems to be in libports.  It
manifests like this:

/hurd/ext2fs.static: ../../libports/../libshouldbeinlibc/refcount.h:171:
refcounts_ref: Assertion '! (r.hard == 1 && r.weak == 0) || !"refcount
detected use-after-free!"' failed.
/hurd/ext2fs.static: ../../libports/complete-deallocate.c:41:
_ports_complete_deallocate: Assertion '! "reacquired reference w/o send
rights"' failed.

gdb indicates that the port in question was generated by
libfshelp/get-identity.c.  That file's a short read; basically, we're
storing ports in a inode-to-port hash, looking them up when io_identity()
gets called, and removing them from the hash when the class's clean routine
gets called.

I think what's happening is that we have a port that loses its last send
right, and after its refcount is decremented but before its clean routine
gets called, another call to io_identity() pulls it out of the hash.  Then
you've got ports_get_right complaining (that's the first line) that it's
incrementing a zero refcount, and ports_port_deref complaining (that's the
second line) that it deallocating a port that now has send rights.

Looking at the tail end of libports/no-senders.c, you'll see that
ports_port_deref gets called after we've dropped the mutex on _ports_lock.
I'm thinking that we need to hold that mutex all the way until the class's
clean routine has returned in order to assure that the refcount get
decremented and the port gets removed from the hash atomically.

Of course, that requires holding a global lock while the clean routine
runs.  It seems to me that only the port in question needs to be locked,
but the individual ports don't seem to have mutexs associated with them.

Any ideas what to do?

agape
brent