Re: [QUESTION] {start,stop}_this_handle() and lockdep annotations

2022-11-04 Thread Theodore Ts'o
Note: in the future, I'd recommend looking at the MAINTAINERS to
figure out a smaller list of people to ask this question, instead of
spamming everyone who has ever expressed interest in DEPT.


On Fri, Nov 04, 2022 at 02:56:32PM +0900, Byungchul Park wrote:
> Peterz (commit 34a3d1e8370870 lockdep: annotate journal_start()) and
> the commit message quoted what Andrew Morton said. It was like:
> 
> > Except lockdep doesn't know about journal_start(), which has ranking
> > requirements similar to a semaphore.
> 
> Could anyone tell what the ranking requirements in the journal code
> exactly means and what makes {start,stop}_this_handle() work for the
> requirements?

The comment from include/linux/jbd2.h may be helpful:

#ifdef CONFIG_DEBUG_LOCK_ALLOC
/**
 * @j_trans_commit_map:
 *
 * Lockdep entity to track transaction commit dependencies. Handles
 * hold this "lock" for read, when we wait for commit, we acquire the
 * "lock" for writing. This matches the properties of jbd2 journalling
 * where the running transaction has to wait for all handles to be
 * dropped to commit that transaction and also acquiring a handle may
 * require transaction commit to finish.
 */
struct lockdep_map  j_trans_commit_map;
#endif

And the reason why this isn't a problem is because start_this_handle()
can be passed a special handle which is guaranteed to not block
(because we've reserved journal credits for it).  Hence, there is no
risk that in _this_ call path start_this_handle() will block for a
commit:

> <4>[   43.124442 ] stacktrace:
> <4>[   43.124443 ]   start_this_handle+0x557/0x620
> <4>[   43.124445 ]   jbd2_journal_start_reserved+0x4d/0x1b0
   ^^^
> <4>[   43.124448 ]   __ext4_journal_start_reserved+0x6d/0x190
> <4>[   43.124450 ]   ext4_convert_unwritten_io_end_vec+0x22/0xd0
> <4>[   43.124453 ]   ext4_end_io_rsv_work+0xe4/0x190
> <4>[   43.124455 ]   process_one_work+0x301/0x660
> <4>[   43.124458 ]   worker_thread+0x3a/0x3c0
> <4>[   43.124459 ]   kthread+0xef/0x120
> <4>[   43.124462 ]   ret_from_fork+0x22/0x30


The comment for this function from fs/jbd2/transaction.c:

/**
 * jbd2_journal_start_reserved() - start reserved handle
 * @handle: handle to start
 * @type: for handle statistics
 * @line_no: for handle statistics
 *
 * Start handle that has been previously reserved with jbd2_journal_reserve().
 * This attaches @handle to the running transaction (or creates one if there's
 * not transaction running). Unlike jbd2_journal_start() this function cannot
 * block on journal commit, checkpointing, or similar stuff. It can block on
 * memory allocation or frozen journal though.
 *
 * Return 0 on success, non-zero on error - handle is freed in that case.
 */

And this is why this will never be a problem in real life, or flagged
by Lockdep, since Lockdep is a dynamic checker.  The deadlock which
the static DEPT checker has imagined can never, ever, ever happen.

For more context, also from fs/jbd2/transaction.c:

/**
 * jbd2_journal_start() - Obtain a new handle.
 * @journal: Journal to start transaction on.
 * @nblocks: number of block buffer we might modify
 *
 * We make sure that the transaction can guarantee at least nblocks of
 * modified buffers in the log.  We block until the log can guarantee
 * that much space. Additionally, if rsv_blocks > 0, we also create another
 * handle with rsv_blocks reserved blocks in the journal. This handle is
 * stored in h_rsv_handle. It is not attached to any particular transaction
 * and thus doesn't block transaction commit. If the caller uses this reserved
 * handle, it has to set h_rsv_handle to NULL as otherwise jbd2_journal_stop()
 * on the parent handle will dispose the reserved one. Reserved handle has to
 * be converted to a normal handle using jbd2_journal_start_reserved() before
 * it can be used.
 *
 * Return a pointer to a newly allocated handle, or an ERR_PTR() value
 * on failure.
 */

To be clear, I don't blame DEPT for not being able to figure this out;
it would require human-level intelligence to understand why in *this*
call path, we never end up waiting.  But this is why I am very
skeptical of static analyzers which are *sure* that anything they flag
is definitely a bug.  We definitely will need a quick and easy way to
tell DEPT, "go home, you're drunk".

Hope this helps,

- Ted


P.S.  Note: the actual function names are a bit misleading.  It looks
like the functions got refactored, and the documentation wasn't
updated to match.  Sigh... fortuantely the concepts are accurate; it's
just that function names needs to be fixed up.  For example, creating
a reserved handle is no longer done via jbd2_journal_start(), but
rather jbd2__journal_start().  



[QUESTION] {start,stop}_this_handle() and lockdep annotations

2022-11-04 Thread Byungchul Park
Hello,

With v6.1-rc3, a deadlock in the journal code was reported by DEPT. I'd
need to understand the journal code first before deep dive into it.

I found a few lockdep annotations was added into the journal code by
Peterz (commit 34a3d1e8370870 lockdep: annotate journal_start()) and
the commit message quoted what Andrew Morton said. It was like:

> Except lockdep doesn't know about journal_start(), which has ranking
> requirements similar to a semaphore.

Could anyone tell what the ranking requirements in the journal code
exactly means and what makes {start,stop}_this_handle() work for the
requirements?

Any comments would be appreciated. If needed, I'll explan the following
report and scenario in detail.

Just for your information, let me paste the DEPT report below.

Byungchul

---
<4>[   43.124364 ] ===
<4>[   43.124366 ] DEPT: Circular dependency has been detected.
<4>[   43.124367 ] 6.1.0-rc3-Trybot_110402v1-g601b2ef606e4+ #1 Not tainted
<4>[   43.124369 ] ---
<4>[   43.124369 ] summary
<4>[   43.124370 ] ---
<4>[   43.124370 ] *** DEADLOCK ***

<4>[   43.124371 ] context A
<4>[   43.124372 ] [S] start_this_handle(jbd2_handle:0)
<4>[   43.124373 ] [W] folio_wait_bit_common(pglocked:0)
<4>[   43.124374 ] [E] up_read(jbd2_handle:0)

<4>[   43.124376 ] context B
<4>[   43.124377 ] [S] (unknown)(pgwriteback:0)
<4>[   43.124378 ] [W] start_this_handle(jbd2_handle:0)
<4>[   43.124379 ] [E] folio_wake_bit(pgwriteback:0)

<4>[   43.124380 ] context C
<4>[   43.124381 ] [S] (unknown)(pglocked:0)
<4>[   43.124382 ] [W] folio_wait_bit_common(pgwriteback:0)
<4>[   43.124383 ] [E] folio_wake_bit(pglocked:0)

<4>[   43.124384 ] [S]: start of the event context
<4>[   43.124384 ] [W]: the wait blocked
<4>[   43.124385 ] [E]: the event not reachable
<4>[   43.124385 ] ---
<4>[   43.124386 ] context A's detail
<4>[   43.124387 ] ---
<4>[   43.124387 ] context A
<4>[   43.124388 ] [S] start_this_handle(jbd2_handle:0)
<4>[   43.124389 ] [W] folio_wait_bit_common(pglocked:0)
<4>[   43.124389 ] [E] up_read(jbd2_handle:0)

<4>[   43.124391 ] [S] start_this_handle(jbd2_handle:0):
<4>[   43.124392 ] [] start_this_handle+0x4bc/0x620
<4>[   43.124398 ] stacktrace:
<4>[   43.124399 ]   start_this_handle+0x57c/0x620
<4>[   43.124401 ]   jbd2__journal_start+0xe8/0x270
<4>[   43.124404 ]   __ext4_journal_start_sb+0x147/0x190
<4>[   43.124407 ]   ext4_writepages+0x76f/0x1120
<4>[   43.124409 ]   do_writepages+0xc1/0x180
<4>[   43.124412 ]   file_write_and_wait_range+0xab/0xf0
<4>[   43.124414 ]   ext4_sync_file+0x122/0x480
<4>[   43.124416 ]   do_fsync+0x33/0x70
<4>[   43.124419 ]   __x64_sys_fdatasync+0xe/0x20
<4>[   43.124421 ]   do_syscall_64+0x44/0x90
<4>[   43.124424 ]   entry_SYSCALL_64_after_hwframe+0x63/0xcd

<4>[   43.124427 ] [W] folio_wait_bit_common(pglocked:0):
<4>[   43.124428 ] [] mpage_prepare_extent_to_map+0x446/0x7a0

<4>[   43.124431 ] [E] up_read(jbd2_handle:0):
<4>[   43.124431 ] (N/A)
<4>[   43.124432 ] ---
<4>[   43.124433 ] context B's detail
<4>[   43.124433 ] ---
<4>[   43.124434 ] context B
<4>[   43.124434 ] [S] (unknown)(pgwriteback:0)
<4>[   43.124435 ] [W] start_this_handle(jbd2_handle:0)
<4>[   43.124436 ] [E] folio_wake_bit(pgwriteback:0)

<4>[   43.124437 ] [S] (unknown)(pgwriteback:0):
<4>[   43.124438 ] (N/A)

<4>[   43.124439 ] [W] start_this_handle(jbd2_handle:0):
<4>[   43.124440 ] [] start_this_handle+0x4bc/0x620
<4>[   43.124442 ] stacktrace:
<4>[   43.124443 ]   start_this_handle+0x557/0x620
<4>[   43.124445 ]   jbd2_journal_start_reserved+0x4d/0x1b0
<4>[   43.124448 ]   __ext4_journal_start_reserved+0x6d/0x190
<4>[   43.124450 ]   ext4_convert_unwritten_io_end_vec+0x22/0xd0
<4>[   43.124453 ]   ext4_end_io_rsv_work+0xe4/0x190
<4>[   43.124455 ]   process_one_work+0x301/0x660
<4>[   43.124458 ]   worker_thread+0x3a/0x3c0
<4>[   43.124459 ]   kthread+0xef/0x120
<4>[   43.124462 ]   ret_from_fork+0x22/0x30

<4>[   43.124464 ] [E] folio_wake_bit(pgwriteback:0):
<4>[   43.124465 ] [] folio_end_writeback+0x78/0xf0
<4>[   43.124467 ] stacktrace:
<4>[   43.124468 ]   folio_wake_bit+0x194/0x1a0
<4>[   43.124470 ]   folio_end_writeback+0x78/0xf0
<4>[   43.124471 ]   ext4_finish_bio+0x23f/0x340
<4>[   43.124473 ]   ext4_release_io_end+0x43/0xf0
<4>[   43.124474 ]   ext4_end_io_rsv_work+0x90/0x190
<4>[   43.124476 ]   process_one_work+0x301/0x660
<4>[   43.124478 ]   worker_thread+0x3a/0x3c0
<4>[   43.124480 ]   kthread+0xef/0x120
<4>[   43.124481 ]   ret_from_fork+0x22/0x30
<4>[