[uml-devel] Re: SMP skas requirements

Blaisorblade Tue, 28 Mar 2006 18:49:04 -0800

I'm CC:ing this to uml-devel, finally.

On Wednesday 29 March 2006 02:41, Jeff Dike wrote:

> On Wed, Mar 29, 2006 at 12:40:00AM +0200, Blaisorblade wrote:
> > Since you've now split out delete-hostfs, why don't you merge the new
> > hostfs, possibly labeling it as "EXPERIMENTAL"?

> My two main gripes right now are
>         there are three filesystems (or one framework and two
> filesystems) in one directory

That's not a big problem, but is one which we'll solve after having users.

>         it depends on filehandle, which has aspects that neither of us likes

We're working on this too slowly.

Give me bugreports by users and filehandle in mainline and I'll probably fix 
it. Right now, I want to do but I'm worried about other stuff (I should go 
now back working on remap_file_pages). Probably I'll be able to work on both, 
since I can't work on that more than a certain amount of time.

> > With the refcount, likely you don't need to move it off-list, but that
> > could maybe be useful to avoid looping on unused fd; however, this
> > requires taking the spinlock when reinserting the element on the list.

> I think you're agreeing with me, but I'm missing why we want to move
> things on and off the list.

I just said "we could do that so when do do reclaim we don't iterate on fds 
which are held somewhere", but IMHO the other side has better reasons to be 
preferred (i.e. possibly skipping taking the lock when reputting it on list).

> > Note: testing that a fd has refcount 0 must be done atomically with
> > freeing it; i.e. even with an atomic_t refcount, it must be incremented
> > only while you have a lock on the list; 

> > and while freeing it, 

I'd have to say "while closing it". But we don't close fd's when they reach 
refcount 0, so in that case we can probably avoid taking the lock.

Instead, when closing an fd, we must still take the lock, move it off-list, 
drop the lock and close the fd.

> > you must 
> > use
> > atomic_dec_and_lock() so you get a lock on the list if the refcount goes
> > to 0 (atomic_dec_and_lock() is equivalent to taking the lock, doing dec
> > and test, and releasing it, but is faster).

> Yes, this looks reasonable.

I've been especially careful because after studying Operating Systems 
(previous semester) I noticed a bug in the description of your ubd_sequence 
patch (against 2.6.14-rc1-mm1, sent as attachment in a mail); I don't recall 
the details, including names, but I remember the bug. I've described it 
below, but now I've discovered that it doesn't apply given the nature of the 
wait_queue's in Linux. I've not had the courage to press Del...

However, while looking at this, I think I've found all sort of race conditions 
and locking problems in the ubd driver.

Saving what I see in a patch, I'll look well at this later, but however I 
think I've seen tons of problems.

=============================
I decided not to describe it at that time because you dropped that 
implementation, but I'm doing it now below (maybe because I'm silly).

There were two atomic_t (started_req and completed_req) which were seq. 
counts, and one wait_list - one task would hang onto a wait list until 
completed_req reached start_req and it was waken up.

Well, there was a bug since there's no semaphore in the design - task A, which 
submits requests, could check that started_req > completed_req (they're 
atomic but reading the couple of them is not atomic - for it getting 
started_req < completed_req was perfectly possible), then task B could 
increment completed_req and send the wakeup to nobody, then task A could go 
to sleep to be never waken up.

***
In Linux, however, wait_event() puts the task on the wait queue, so that it 
can catch wakeups, tests the condition, and calls schedule(); since it's on 
the wait_queue before testing the condition, I assume that wakeups done while 
it's still running after testing the condition are caught.
***

Actually, depending on the details, since task A submits all requests and is 
sleeping, started_req could never increase again, and it would never be 
awakened; otherwise it would be waken up later if somehow there were other 
started and completed requests causing another wake-up.

You need, indeed, the check on the condition and the sleep to be atomic, i.e. 
to protect them with a semaphore; actually in Linux they use a spinlock which 
they manage to drop, when they are on the queue, have decided to go to sleep 
but haven't yet gone to.
-- 
Inform me of my mistakes, so I can keep imitating Homer Simpson's "Doh!".
Paolo Giarrusso, aka Blaisorblade (Skype ID "PaoloGiarrusso", ICQ 215621894)
http://www.user-mode-linux.org/~blaisorblade

# It is important that I/O requests be submitted to the host in the
# order that they are received by the driver, especially since the
# driver can sleep.  This patch adds two atomic counters, one for
# requests started and one for requests submitted to the host.  A new
# request can proceed when started (after being incremented) is one
# more than submitted.  submitted is incremented after all pieces of
# the request have been sent to the host.
# When a request is proceeding out of order, it will sleep on a wait
# queue until its number comes up.
#
# Interaction between this wait_queue and the emergency allocation
# semaphore - a request will only try to allocate data when it is
# its turn to go.  The wait_queue is finished before the semaphore is
# acquired, so there can't be a deadlock between one process holding
# the allocation semaphore and sleeping because it's out of order and
# one proceeding in order and sleeping on the allocation semaphore.
#
# This currently doesn't correctly handle the bitmap, which must be
# written after the data has been finished.  These must be sequenced
# in the same way as the data.
Index: test/arch/um/drivers/ubd_kern.c
===================================================================
--- test.orig/arch/um/drivers/ubd_kern.c	2005-09-17 17:40:34.000000000 -0400
+++ test/arch/um/drivers/ubd_kern.c	2005-09-17 17:40:53.000000000 -0400
@@ -1297,6 +1297,10 @@
 	return(err);
 }
 
+static atomic_t started = ATOMIC_INIT(0);
+static atomic_t submitted = ATOMIC_INIT(0);
+DECLARE_WAIT_QUEUE_HEAD(sequence_queue);
+
 void do_io(struct io_thread_req *req, struct request *r, unsigned long *bitmap)
 {
         struct ubd_aio *aio;
@@ -1304,14 +1308,18 @@
         char *buf;
         void *bitmap_buf = NULL;
         unsigned long len, sector;
-        int nsectors, start, end, bit, err;
+        int nsectors, start, end, bit, err, want;
         __u64 off;
 
-        if(req->bitmap_start != -1){
-		bitmap_io = alloc_bitmap_io();
+	want = atomic_add_return(1, &started);
+	wait_event(sequence_queue, want - 1 == atomic_read(&submitted));
 
+        if(req->bitmap_start != -1){
                 /* Round up to the nearest word */
                 int round = sizeof(unsigned long);
+
+		bitmap_io = alloc_bitmap_io();
+
                 len = (req->bitmap_end - req->bitmap_start +
                        round * 8 - 1) / (round * 8);
                 len *= round;
@@ -1378,4 +1386,7 @@
 
                 start = end;
         } while(start < nsectors);
+
+	atomic_inc(&submitted);
+	wake_up(&sequence_queue);
 }

[uml-devel] Re: SMP skas requirements

Reply via email to