Re: race condition in libports
Samuel Thibault, on sam. 06 janv. 2018 02:03:11 +0100, wrote: > Brent W. Baccala, on ven. 05 janv. 2018 19:47:37 -0500, wrote: > > Well, in that case, perhaps it should work by checking to see if the port > > is in > > the hash, rather than by looking at how many weak references it's got. > > Something more like this: > > > > if ((refcounts_hard_references(>pi.refcounts) == 0) > > && (* i->id_hashloc == i)) > > { > > /* Nobody got a send right in between, we can remove from the hash. > > */ > > hurd_ihash_locp_remove (, i->id_hashloc); > > ports_port_deref_weak (>pi); > > } > > You can't afford dereferencing id_hashloc like that: it could have > gotten away in between. > > Could you try the attached patch instead? Actually that's what libdiskfs/node-cache.c does :) Samuel
Re: race condition in libports
Hello, Brent W. Baccala, on ven. 05 janv. 2018 19:47:37 -0500, wrote: > Well, in that case, perhaps it should work by checking to see if the port is > in > the hash, rather than by looking at how many weak references it's got. > Something more like this: > > if ((refcounts_hard_references(>pi.refcounts) == 0) > && (* i->id_hashloc == i)) > { > /* Nobody got a send right in between, we can remove from the hash. */ > hurd_ihash_locp_remove (, i->id_hashloc); > ports_port_deref_weak (>pi); > } You can't afford dereferencing id_hashloc like that: it could have gotten away in between. Could you try the attached patch instead? > Also, what documentation? The hurd texinfo file? That's the only place that > I > know of where this is documented. Yes: “If nonzero, @var{dropweak_routine} will be called to request weak references to be dropped.” It doesn't say "one weak reference", but "weak references". Samuel diff --git a/libfshelp/get-identity.c b/libfshelp/get-identity.c index f88e0f82..ac1b4f0b 100644 --- a/libfshelp/get-identity.c +++ b/libfshelp/get-identity.c @@ -70,10 +70,12 @@ id_clean (void *cookie) { struct idspec *i = cookie; pthread_mutex_lock (); - if (refcounts_hard_references(>pi.refcounts) == 0) + if (refcounts_hard_references(>pi.refcounts) == 0 + && i->id_hashloc != NULL) { - /* Nobody got a send right in between, we can remove from the hash. */ + /* Nobody got a send right in between, we can remove i from the hash. */ hurd_ihash_locp_remove (, i->id_hashloc); + i->id_hashloc = NULL; ports_port_deref_weak (>pi); } pthread_mutex_unlock ();
Re: race condition in libports
On Fri, Jan 5, 2018 at 7:24 PM, Samuel Thibaultwrote: > Hello, > > Brent W. Baccala, on ven. 05 janv. 2018 17:45:57 -0500, wrote: > > I've "fixed" this by making sure we don't remove the hash table entry > unless > > there are exactly two weak references outstanding, but I'm not sure > that's the > > best way to handle it. It doesn't seem like the dropweak routine should > have > > to be so careful; it shouldn't get called twice like that. > > Well, AIUI from the documentation, dropweak should manage by itself how > many weak references it should drop. I.e. the number of times dropweak > is called doesn't matter, the first call should drop them, and later > calls should be fine that there aren't any to drop any more. So it seems > to me that your proposed fix is correct. I just rewrote it to make sure > to get coherent counts between hard and weak. Well, in that case, perhaps it should work by checking to see if the port is in the hash, rather than by looking at how many weak references it's got. Something more like this: if ((refcounts_hard_references(>pi.refcounts) == 0) && (* i->id_hashloc == i)) { /* Nobody got a send right in between, we can remove from the hash. */ hurd_ihash_locp_remove (, i->id_hashloc); ports_port_deref_weak (>pi); } Also, what documentation? The hurd texinfo file? That's the only place that I know of where this is documented. agape brent
Re: race condition in libports
Hello, Brent W. Baccala, on ven. 05 janv. 2018 17:45:57 -0500, wrote: > I've "fixed" this by making sure we don't remove the hash table entry unless > there are exactly two weak references outstanding, but I'm not sure that's the > best way to handle it. It doesn't seem like the dropweak routine should have > to be so careful; it shouldn't get called twice like that. Well, AIUI from the documentation, dropweak should manage by itself how many weak references it should drop. I.e. the number of times dropweak is called doesn't matter, the first call should drop them, and later calls should be fine that there aren't any to drop any more. So it seems to me that your proposed fix is correct. I just rewrote it to make sure to get coherent counts between hard and weak. Samuel
Re: race condition in libports
Thanks for the proposed fix, I reused existing code to commit them! Samuel
Re: race condition in libports
Samuel Thibault, on ven. 05 janv. 2018 23:54:49 +0100, wrote: > Brent W. Baccala, on ven. 05 janv. 2018 17:45:57 -0500, wrote: > error_t > fshelp_get_identity (struct port_bucket *bucket, > . . ino_t fileno, > . . mach_port_t *pt) > > >if (!idclass) > > id_initialize (); > > > > - i = hurd_ihash_find (, (hurd_ihash_key_t) fileno); > > + i = hurd_ihash_find (, (hurd_ihash_key_t) ); > > That can't mean anything Ah, sorry, I read too fast, didn't notice the hash functions. Samuel
Re: race condition in libports
Brent W. Baccala, on ven. 05 janv. 2018 17:45:57 -0500, wrote: error_t fshelp_get_identity (struct port_bucket *bucket, . . ino_t fileno, . . mach_port_t *pt) >if (!idclass) > id_initialize (); > > - i = hurd_ihash_find (, (hurd_ihash_key_t) fileno); > + i = hurd_ihash_find (, (hurd_ihash_key_t) ); That can't mean anything: fileno is a parameter, its adresse is basically random. Samuel
Re: race condition in libports
On Sun, Dec 17, 2017 at 11:41 PM, Brent W. Baccalawrote: > On Sun, Dec 17, 2017 at 8:02 AM, Samuel Thibault > wrote: > >> Hello, >> >> Brent W. Baccala, on sam. 16 déc. 2017 21:37:05 -0500, wrote: >> > basically, we're storing ports in a inode-to-port hash, looking them >> > up when io_identity() gets called, and removing them from the hash >> > when the class's clean routine gets called. >> >> That's the bug: one needs a reference for this. And it's a weak >> reference: identity is fine with getting rid of it. Could you try the >> attached patch? > > > Makes sense. I've applied it. So far, so good. > I'm still having problems with this, even with the stock ext2fs/libpager code. To reproduce these bugs, I suggest making a copy of the virtual disk (hd0 -> hd1), booting a subhurd on hd1, then trying to build the entire hurd package (dpkg-buildpackage -b). I've yet to successfully build the hurd packages in a subhurd - there's always some bug or another that gets triggered. If not this one, then I've been having problems with the proc server. Back to the io_identity code... Part of the problem is that sizeof(ino_t) is 8 bytes, while sizeof(hurd_ihash_key_t) is only 4 bytes. So we need to use the "generalized key interface" to hash the full 8 bytes. But that doesn't fix everything. I'm seeing periodic double removal from the hash table. It originally manifested as an underflow on the weak reference count, but I added a line: assert_backtrace(hurd_ihash_value_valid(* i->id_hashloc)); right before the hurd_ihash_locp_remove in id_clean, and this assert periodically triggers. The most notable thing about it is that it always seems to be triggered by a NO SENDERS notification with a mscount field less than the mscount in the portinfo structure: #8 0x0807ed9d in __assert_fail_backtrace (assertion=0x817c080 "hurd_ihash_value_valid(* i->id_hashloc)", file=0x817c03c "/root/hurd-0.9.git20171119/build-deb/../libfshelp/get-identity.c", line=60, function=0x817c0a8 <__PRETTY_FUNCTION__.8282> "id_clean") at ./build-deb/../libshouldbeinlibc/assert-backtrace.c:61 #9 0x08073d61 in id_clean (cookie=0x20550ec0) at ./build-deb/../libfshelp/get-identity.c:60 #10 0x0807a20e in ports_port_deref (portstruct=0x20550ec0) at ./build-deb/../libports/port-deref.c:39 #11 0x0807a9bd in internal_demuxer (outheadp=0x22800ef0, inp=0x22802f00) at ./build-deb/../libports/manage-multithread.c:215 #12 synchronized_demuxer (inp=0x22802f00, outheadp=0x22800ef0) at ./build-deb/../libports/manage-multithread.c:239 (gdb) frame 11 #11 0x0807a9bd in internal_demuxer (outheadp=0x22800ef0, inp=0x22802f00) at ./build-deb/../libports/manage-multithread.c:215 215 ports_port_deref (pi); (gdb) x/8x inp 0x22802f00: 0x1700 0x0020 0x 0x20550ec0 0x22802f10: 0x 0x0046 0x10012002 0x0002 (gdb) print *pi $9 = {class = 0x209ae8, refcounts = {references = {hard = 0, weak = 1}, value = 4294967296}, mscount = 24, cancel_threshold = 0, flags = 0, port_right = 22723, current_rpcs = 0x0, bucket = 0x820e628, hentry = 0x84a7cf8, ports_htable_entry = 0x205b3610} Notice that the NO SENDER message had an mscount of 2 (the last word in the x/8x), but the portinfo structure has an mscount of 24. Also, there's no hard references and one weak reference in the portinfo structure. At this point there should be two weak references - one for the hash table entry and one for the demuxer. If the hash table assert didn't fail, then we'd be getting a weak reference underflow soon. I've puzzled over this for days, and this is my current theory for what's happening. We started with one hard reference (because we have outstanding send rights) and one weak reference (the hash table). Then we got a NO SENDERS message, so we started to drop the hard reference by demoting it to a weak reference, in libports/port-deref.c. Now we've got no hards, two weaks, the entry still in the hash table, and nothing locked (we're about to call the dropweak_routine). At this point, other threads preempt us. Our portinfo structure is still in the hash table, so more send rights get created (22 more, in the gdb trace above!), they get consumed, and another NO SENDERS message gets generated. It gets to the dropweak_routine with no hard references and three weak references (the two from before, plus one more from the second NO SENDERS' demuxer). It removes the hash table entry (dropping one weak reference), drops another weak reference (in ports_port_deref), and finishes running. Now the first thread (finally) gets to run again. It's now got no hard references, one weak reference, and the dropweak_routine runs on a portinfo structure that's already been removed from the hash table. It removes the hash table entry (again), drops the last weak reference, and then tries to drop another weak reference in ports_port_deref, which
Re: race condition in libports
On Sun, Dec 17, 2017 at 8:02 AM, Samuel Thibaultwrote: > Hello, > > Brent W. Baccala, on sam. 16 déc. 2017 21:37:05 -0500, wrote: > > basically, we're storing ports in a inode-to-port hash, looking them > > up when io_identity() gets called, and removing them from the hash > > when the class's clean routine gets called. > > That's the bug: one needs a reference for this. And it's a weak > reference: identity is fine with getting rid of it. Could you try the > attached patch? Makes sense. I've applied it. So far, so good. Now I feel stupid! Holding a port in a hash is such a classic example of a weak reference, but I didn't even think about moving that code out of the clean function. Thank you. agape brent
Re: race condition in libports
Hello, Brent W. Baccala, on sam. 16 déc. 2017 21:37:05 -0500, wrote: > basically, we're storing ports in a inode-to-port hash, looking them > up when io_identity() gets called, and removing them from the hash > when the class's clean routine gets called. That's the bug: one needs a reference for this. And it's a weak reference: identity is fine with getting rid of it. Could you try the attached patch? Samuel Index: hurd-debian/libfshelp/get-identity.c === --- hurd-debian.orig/libfshelp/get-identity.c +++ hurd-debian/libfshelp/get-identity.c @@ -42,7 +43,12 @@ id_clean (void *cookie) { struct idspec *i = cookie; pthread_mutex_lock (); - hurd_ihash_locp_remove (, i->id_hashloc); + if (refcounts_hard_references(>pi.refcounts) == 0) +{ + /* Nobody got a send right in between, we can remove from the hash. */ + hurd_ihash_locp_remove (, i->id_hashloc); + ports_port_deref_weak (>pi); +} pthread_mutex_unlock (); } @@ -50,7 +56,7 @@ static void id_initialize () { assert_backtrace (!idclass); - idclass = ports_create_class (id_clean, NULL); + idclass = ports_create_class (NULL, id_clean); } error_t @@ -75,6 +81,9 @@ fshelp_get_identity (struct port_bucket if (err) goto lose_port; + /* Weak reference for the hash entry. */ + ports_port_ref_weak(>pi); + *pt = ports_get_right (i); ports_port_deref (i); }
race condition in libports
Hi - I'm making good progress on the multi-client libpager. I've been running it on my root filesystem for about a month now, with few problems recently. However, there are still some bugs. One seems to be in libports. It manifests like this: /hurd/ext2fs.static: ../../libports/../libshouldbeinlibc/refcount.h:171: refcounts_ref: Assertion '! (r.hard == 1 && r.weak == 0) || !"refcount detected use-after-free!"' failed. /hurd/ext2fs.static: ../../libports/complete-deallocate.c:41: _ports_complete_deallocate: Assertion '! "reacquired reference w/o send rights"' failed. gdb indicates that the port in question was generated by libfshelp/get-identity.c. That file's a short read; basically, we're storing ports in a inode-to-port hash, looking them up when io_identity() gets called, and removing them from the hash when the class's clean routine gets called. I think what's happening is that we have a port that loses its last send right, and after its refcount is decremented but before its clean routine gets called, another call to io_identity() pulls it out of the hash. Then you've got ports_get_right complaining (that's the first line) that it's incrementing a zero refcount, and ports_port_deref complaining (that's the second line) that it deallocating a port that now has send rights. Looking at the tail end of libports/no-senders.c, you'll see that ports_port_deref gets called after we've dropped the mutex on _ports_lock. I'm thinking that we need to hold that mutex all the way until the class's clean routine has returned in order to assure that the refcount get decremented and the port gets removed from the hash atomically. Of course, that requires holding a global lock while the clean routine runs. It seems to me that only the port in question needs to be locked, but the individual ports don't seem to have mutexs associated with them. Any ideas what to do? agape brent