Public bug reported:

SRU Justification
-----------------

[Impact]

A kernel BUG is sometimes observed when using fscache:
    [4740718.880898] FS-Cache:
    [4740718.880920] FS-Cache: Assertion failed
    [4740718.880934] FS-Cache: 0 > 0 is false
    [4740718.881001] ------------[ cut here ]------------
    [4740718.881017] kernel BUG at 
/usr/src/linux-4.4.0/fs/fscache/operation.c:449!
    [4740718.881040] invalid opcode: 0000 [#1] SMP
    
    [4740718.892659] Call Trace:
    [4740718.893506]  [<ffffffffc1464cf9>] cachefiles_read_copier+0x3a9/0x410 
[cachefiles]
    [4740718.894374]  [<ffffffffc037e272>] fscache_op_work_func+0x22/0x50 
[fscache]
    [4740718.895180]  [<ffffffff81096da0>] process_one_work+0x150/0x3f0
    [4740718.895966]  [<ffffffff8109751a>] worker_thread+0x11a/0x470
    [4740718.896753]  [<ffffffff81808e59>] ? __schedule+0x359/0x980
    [4740718.897783]  [<ffffffff81097400>] ? rescuer_thread+0x310/0x310
    [4740718.898581]  [<ffffffff8109cdd6>] kthread+0xd6/0xf0
    [4740718.899469]  [<ffffffff8109cd00>] ? kthread_park+0x60/0x60
    [4740718.900477]  [<ffffffff8180d0cf>] ret_from_fork+0x3f/0x70
    [4740718.901514]  [<ffffffff8109cd00>] ? kthread_park+0x60/0x60

[Problem]

In include/fscache-cache.h, fscache_retrieval_complete reads, in part:

            atomic_sub(n_pages, &op->n_pages);
            if (atomic_read(&op->n_pages) <= 0)
                    fscache_op_complete(&op->op, true);

The code is using atomic_sub followed by an atomic_read. This causes two
threads doing a decrement of pages to race with each other seeing the
op->refcount <= 0 at same time, and end up calling fscache_op_complete
in both the threads leading to the OOPS.

[Fix]
The fix is trivial to use atomic_sub_return instead of two calls.

[Testcase]
The user has tested the patch successfully on their fscache/cachefiles setup.

[Regression Potential]
Limited to fscache. Small, comprehensible change.

** Affects: linux (Ubuntu)
     Importance: Undecided
         Status: Incomplete

** Description changed:

  SRU Justification
  -----------------
  
  [Impact]
  
  A kernel BUG is sometimes observed when using fscache:
+     [4740718.880898] FS-Cache:
+     [4740718.880920] FS-Cache: Assertion failed
+     [4740718.880934] FS-Cache: 0 > 0 is false
+     [4740718.881001] ------------[ cut here ]------------
+     [4740718.881017] kernel BUG at 
/usr/src/linux-4.4.0/fs/fscache/operation.c:449!
+     [4740718.881040] invalid opcode: 0000 [#1] SMP
+     
+     [4740718.892659] Call Trace:
+     [4740718.893506]  [<ffffffffc1464cf9>] cachefiles_read_copier+0x3a9/0x410 
[cachefiles]
+     [4740718.894374]  [<ffffffffc037e272>] fscache_op_work_func+0x22/0x50 
[fscache]
+     [4740718.895180]  [<ffffffff81096da0>] process_one_work+0x150/0x3f0
+     [4740718.895966]  [<ffffffff8109751a>] worker_thread+0x11a/0x470
+     [4740718.896753]  [<ffffffff81808e59>] ? __schedule+0x359/0x980
+     [4740718.897783]  [<ffffffff81097400>] ? rescuer_thread+0x310/0x310
+     [4740718.898581]  [<ffffffff8109cdd6>] kthread+0xd6/0xf0
+     [4740718.899469]  [<ffffffff8109cd00>] ? kthread_park+0x60/0x60
+     [4740718.900477]  [<ffffffff8180d0cf>] ret_from_fork+0x3f/0x70
+     [4740718.901514]  [<ffffffff8109cd00>] ? kthread_park+0x60/0x60
  
-     Jun 25 11:32:08  kernel: [4740718.880898] FS-Cache:
-     Jun 25 11:32:08  kernel: [4740718.880920] FS-Cache: Assertion failed
-     Jun 25 11:32:08  kernel: [4740718.880934] FS-Cache: 0 > 0 is false
-     Jun 25 11:32:08  kernel: [4740718.881001] ------------[ cut here 
]------------
-     Jun 25 11:32:08  kernel: [4740718.881017] kernel BUG at 
/usr/src/linux-4.4.0/fs/fscache/operation.c:449!
-     Jun 25 11:32:08  kernel: [4740718.881040] invalid opcode: 0000 [#1] SMP
-     ...
-     Jun 25 11:32:08  kernel: [4740718.892659] Call Trace:
-     Jun 25 11:32:08  kernel: [4740718.893506]  [<ffffffffc1464cf9>] 
cachefiles_read_copier+0x3a9/0x410 [cachefiles]
-     Jun 25 11:32:08  kernel: [4740718.894374]  [<ffffffffc037e272>] 
fscache_op_work_func+0x22/0x50 [fscache]
-     Jun 25 11:32:08  kernel: [4740718.895180]  [<ffffffff81096da0>] 
process_one_work+0x150/0x3f0
-     Jun 25 11:32:08  kernel: [4740718.895966]  [<ffffffff8109751a>] 
worker_thread+0x11a/0x470
-     Jun 25 11:32:08  kernel: [4740718.896753]  [<ffffffff81808e59>] ? 
__schedule+0x359/0x980
-     Jun 25 11:32:08  kernel: [4740718.897783]  [<ffffffff81097400>] ? 
rescuer_thread+0x310/0x310
-     Jun 25 11:32:08  kernel: [4740718.898581]  [<ffffffff8109cdd6>] 
kthread+0xd6/0xf0
-     Jun 25 11:32:08  kernel: [4740718.899469]  [<ffffffff8109cd00>] ? 
kthread_park+0x60/0x60
-     Jun 25 11:32:08  kernel: [4740718.900477]  [<ffffffff8180d0cf>] 
ret_from_fork+0x3f/0x70
-     Jun 25 11:32:08  kernel: [4740718.901514]  [<ffffffff8109cd00>] ? 
kthread_park+0x60/0x60
-     
  [Problem]
  
  In include/fscache-cache.h, fscache_retrieval_complete reads, in part:
  
-             atomic_sub(n_pages, &op->n_pages);
-             if (atomic_read(&op->n_pages) <= 0)
-                     fscache_op_complete(&op->op, true);
-     
- The code is using atomic_sub followed by an atomic_read. This causes two 
threads doing a decrement of pages to race with each other seeing the 
op->refcount <= 0 at same time,
- and end up calling fscache_op_complete in both the threads leading to the 
OOPS.
-     
+             atomic_sub(n_pages, &op->n_pages);
+             if (atomic_read(&op->n_pages) <= 0)
+                     fscache_op_complete(&op->op, true);
+ 
+ The code is using atomic_sub followed by an atomic_read. This causes two
+ threads doing a decrement of pages to race with each other seeing the
+ op->refcount <= 0 at same time, and end up calling fscache_op_complete
+ in both the threads leading to the OOPS.
+ 
  [Fix]
  The fix is trivial to use atomic_sub_return instead of two calls.
  
  [Testcase]
  The user has tested the patch successfully on their fscache/cachefiles setup.
  
  [Regression Potential]
  Limited to fscache. Small, comprehensible change.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1797314

Title:
  fscache: bad refcounting in fscache_op_complete leads to OOPS

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1797314/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to