Hi Changwei,

On 2018/1/25 20:30, Changwei Ge wrote:
> Hi Jun,
> 
> On 2018/1/25 20:18, piaojun wrote:
>> Hi Changwei,
>>
>> On 2018/1/25 19:59, Changwei Ge wrote:
>>> Hi Jun,
>>>
>>> On 2018/1/25 10:41, piaojun wrote:
>>>> We should not reuse the dirty bh in jbd2 directly due to the following
>>>> situation:
>>>>
>>>> 1. When removing extent rec, we will dirty the bhs of extent rec and
>>> Quick questions:
>>> Do you mean current code puts modifying extent records belonging to a 
>>> certain file and modifying truncate log into the same transaction?
>>> If so, forgive me since I didn't figure it out. Could you point out in your 
>>> following sequence diagram?
>>>
>>> Afterwards, I can understand the issue your change log is describing better.
>>>
>>> Thanks,
>>> Changwei
>>>
>> Yes, I mean they are in the same transaction as below:
>>
>> ocfs2_remove_btree_range
>>    ocfs2_remove_extent // modify extent records
>>      ocfs2_truncate_log_append // modify truncate log
> 
> If so I think the transaction including operations on extent and truncate log 
> won't be committed.
> And journal should already be aborted if interval transaction commit thread 
> has been woken.
> So no metadata will be changed.
> And later, ocfs2_truncate_log_worker shouldn't see any inode on truncate log.
> Are you sure this is the root cause of your problem?
> I feel a little strange for it.
> 
> Thanks,
> Changwei
> 
As you said, the transaction was not committed, but after a while the
bh of truncate log was committed in another transaction. I'm sure for
the cause and after applying this patch, the duplicate cluster problem
is gone. I have tested it a few month.

thanks,
Jun

>>
>> thanks,
>> Jun
>>
>>>>      truncate log at the same time, and hand them over to jbd2.
>>>> 2. The bhs are not flushed to disk due to abnormal storage link.
>>>> 3. After a while the storage link become normal. Truncate log flush
>>>>      worker triggered by the next space reclaiming found the dirty bh of
>>>>      truncate log and clear its 'BH_Write_EIO' and then set it uptodate
>>>>      in __ocfs2_journal_access():
>>>>
>>>> ocfs2_truncate_log_worker
>>>>     ocfs2_flush_truncate_log
>>>>       __ocfs2_flush_truncate_log
>>>>         ocfs2_replay_truncate_records
>>>>           ocfs2_journal_access_di
>>>>             __ocfs2_journal_access // here we clear io_error and set 
>>>> 'tl_bh' uptodata.
>>>>
>>>> 4. Then jbd2 will flush the bh of truncate log to disk, but the bh of
>>>>      extent rec is still in error state, and unfortunately nobody will
>>>>      take care of it.
>>>> 5. At last the space of extent rec was not reduced, but truncate log
>>>>      flush worker have given it back to globalalloc. That will cause
>>>>      duplicate cluster problem which could be identified by fsck.ocfs2.
>>>>
>>>> So we should return -EIO in case of ruining atomicity and consistency
>>>> of space reclaim.
>>>>
>>>> Fixes: acf8fdbe6afb ("ocfs2: do not BUG if buffer not uptodate in 
>>>> __ocfs2_journal_access")
>>>>
>>>> Signed-off-by: Jun Piao <piao...@huawei.com>
>>>> Reviewed-by: Yiwen Jiang <jiangyi...@huawei.com>
>>>> ---
>>>>    fs/ocfs2/journal.c | 45 +++++++++++++++++++++++++++++++++++----------
>>>>    1 file changed, 35 insertions(+), 10 deletions(-)
>>>>
>>>> diff --git a/fs/ocfs2/journal.c b/fs/ocfs2/journal.c
>>>> index 3630443..d769ca2 100644
>>>> --- a/fs/ocfs2/journal.c
>>>> +++ b/fs/ocfs2/journal.c
>>>> @@ -666,21 +666,46 @@ static int __ocfs2_journal_access(handle_t *handle,
>>>>            /* we can safely remove this assertion after testing. */
>>>>            if (!buffer_uptodate(bh)) {
>>>>                    mlog(ML_ERROR, "giving me a buffer that's not 
>>>> uptodate!\n");
>>>> -          mlog(ML_ERROR, "b_blocknr=%llu\n",
>>>> -               (unsigned long long)bh->b_blocknr);
>>>> +          mlog(ML_ERROR, "b_blocknr=%llu, b_state=0x%lx\n",
>>>> +               (unsigned long long)bh->b_blocknr, bh->b_state);
>>>>
>>>>                    lock_buffer(bh);
>>>>                    /*
>>>> -           * A previous attempt to write this buffer head failed.
>>>> -           * Nothing we can do but to retry the write and hope for
>>>> -           * the best.
>>>> +           * We should not reuse the dirty bh directly due to the
>>>> +           * following situation:
>>>> +           *
>>>> +           * 1. When removing extent rec, we will dirty the bhs of
>>>> +           *    extent rec and truncate log at the same time, and
>>>> +           *    hand them over to jbd2.
>>>> +           * 2. The bhs are not flushed to disk due to abnormal
>>>> +           *    storage link.
>>>> +           * 3. After a while the storage link become normal.
>>>> +           *    Truncate log flush worker triggered by the next
>>>> +           *    space reclaiming found the dirty bh of truncate log
>>>> +           *    and clear its 'BH_Write_EIO' and then set it uptodate
>>>> +           *    in __ocfs2_journal_access():
>>>> +           *
>>>> +           *    ocfs2_truncate_log_worker
>>>> +           *      ocfs2_flush_truncate_log
>>>> +           *        __ocfs2_flush_truncate_log
>>>> +           *          ocfs2_replay_truncate_records
>>>> +           *            ocfs2_journal_access_di
>>>> +           *              __ocfs2_journal_access
>>>> +           *
>>>> +           * 4. Then jbd2 will flush the bh of truncate log to disk,
>>>> +           *    but the bh of extent rec is still in error state, and
>>>> +           *    unfortunately nobody will take care of it.
>>>> +           * 5. At last the space of extent rec was not reduced,
>>>> +           *    but truncate log flush worker have given it back to
>>>> +           *    globalalloc. That will cause duplicate cluster problem
>>>> +           *    which could be identified by fsck.ocfs2.
>>>> +           *
>>>> +           * So we should return -EIO in case of ruining atomicity
>>>> +           * and consistency of space reclaim.
>>>>                     */
>>>>                    if (buffer_write_io_error(bh) && !buffer_uptodate(bh)) {
>>>> -                  clear_buffer_write_io_error(bh);
>>>> -                  set_buffer_uptodate(bh);
>>>> -          }
>>>> -
>>>> -          if (!buffer_uptodate(bh)) {
>>>> +                  mlog(ML_ERROR, "A previous attempt to write this "
>>>> +                          "buffer head failed\n");
>>>>                            unlock_buffer(bh);
>>>>                            return -EIO;
>>>>                    }
>>>>
>>> .
>>>
>>
> .
> 

_______________________________________________
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Reply via email to