Hi,

I believe I have found a race condition in the ZFS ARC code.

The problem manifests itself only when debugging is turned on and 
arc_mru->arcs_size is very close to arc_mru->arcs_lsize.

It causes these assertions in arc.c to fail:
        1) In remove_reference(): ASSERT3U(state->arcs_size, >=, 
state->arcs_lsize);
        2) In arc_change_state(): ASSERT3U(new_state->arcs_size + to_delta, >=, 
new_state->arcs_lsize);

Steps to reproduce:
        Well, in zfs-fuse it's just a matter of compiling in debug mode and 
running 'bonnie++ -f -d /pool'. It fails a couple of minutes into the rewrite 
test with the arc_change_state() assertion.

Currently, in zfs-fuse, there is a lot of context switching, which may trigger 
this bug more frequently than in Solaris. I have found that, with the 
bonnie++ workload, as much as 6 threads enter arc_change_state() at the same 
time, just before it fails.

Also, I have a relatively low (64 MB) arc_c_max value, which might be 
relevant.

It is much easier to reproduce if you apply this patch (relative to the 
current mercurial tip):

diff -r 773fb303fd36 usr/src/uts/common/fs/zfs/arc.c
--- a/usr/src/uts/common/fs/zfs/arc.c   Mon Feb 19 05:28:47 2007 -0800
+++ b/usr/src/uts/common/fs/zfs/arc.c   Tue Feb 20 05:23:03 2007 +0000
@@ -834,6 +834,8 @@ arc_change_state(arc_state_t *new_state,
 
                        if (use_mutex)
                                mutex_exit(&new_state->arcs_mtx);
+                       if (new_state == arc_mru)
+                               delay(hz); // sleep 1 second
                }
        }

With this patch, zfs-fuse will always crash in the remove_reference() 
assertion simply by mounting a filesystem (when compiled in debug mode, of 
course).

Unfortunately, even with that patch, it's a little hard to trigger the bug 
with ztest because arc_mru->arcs_size gets much bigger than 
arc_mru->arcs_lsize after a while. I had moderate success with the following 
steps:

        1) Applying the patch
        2) Changing the ztest_dmu_write_parallel test frequency from 
zopt_always to 
zopt_rarely (I'm unsure how much this actually helps)
        3) Compiling in debug mode
        4) Running ztest with parameters "-T600 -P3"
        5) Keep retrying. Usually if it doesn't fail in the first 10 minutes, I 
think 
it's better to start ztest from the beginning..

I have fixed this bug with the attached patch, which I don't really like very 
much, but it fixes the race.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: arc.patch
Type: text/x-diff
Size: 823 bytes
Desc: not available
URL: 
<http://mail.opensolaris.org/pipermail/zfs-code/attachments/20070220/7be98875/attachment.bin>

Reply via email to