Re: [PATCH v2 7/9] nilfs2: ensure that all dirty blocks are written out

2015-06-01 Thread Andreas Rohner
On 2015-06-01 06:13, Ryusuke Konishi wrote:
> On Sun, 10 May 2015 13:04:18 +0200, Andreas Rohner wrote:
>> On 2015-05-09 14:17, Ryusuke Konishi wrote:
>>> On Sun,  3 May 2015 12:05:20 +0200, Andreas Rohner wrote:
> [...]
>>>
>>> Uum. This still looks to have potential for leak of dirty block
>>> collection between DAT and SUFILE since this retry is limited by
>>> the fixed retry count.
>>>
>>> How about adding function temporarily turning off the live block
>>> tracking and using it after this propagation loop until log write
>>> finishes ?
>>>
>>> It would reduce the accuracy of live block count, but is it enough ?
>>> How do you think ?  We have to eliminate the possibility of the leak
>>> because it can cause file system corruption.  Every checkpoint must be
>>> self-contained.
>>
>> How exactly could it lead to file system corruption? Maybe I miss
>> something important here, but it seems to me, that no corruption is
>> possible.
>>
>> The nilfs_sufile_flush_cache_node() function only reads in already
>> existing blocks. No new blocks are created. If I mark those blocks
>> dirty, the btree is not changed at all. If I do not call
>> nilfs_bmap_propagate(), then the btree stays unchanged and there are no
>> dangling pointers. The resulting checkpoint should be self-contained.
> 
> Good point.  As for btree, it looks like no inconsistency issue arises
> since nilfs_sufile_flush_cache_node() never inserts new blocks as you
> pointed out.  Even though we also must care inconsistency between
> sufile header and sufile data blocks, and block count in inode as
> well, fortunately these look to be ok, too.
> 
> However, I still think it's not good to carry over dirty blocks to the
> next segment construction to avoid extra checkpoint creation and to
> simplify things.
> 
>>From this viewpoint, I also prefer that nilfs_sufile_flush_cache() and
> nilfs_sufile_flush_cache_node() are changed a bit so that they will
> skip adjusting su_nlive_blks and su_nlive_lastmod if the sufile block
> that includes the segment usage is not marked dirty and only_mark == 0
> as well as turing off live block counting temporarily after the
> sufile/DAT propagation loop.

Ok I'll start working on this.

Regards,
Andreas Rohner

>>
>> The only problem would be, that I could lose some nlive_blks updates.
>>
> 
> Regards,
> Ryusuke Konishi
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 7/9] nilfs2: ensure that all dirty blocks are written out

2015-05-31 Thread Ryusuke Konishi
On Sun, 10 May 2015 13:04:18 +0200, Andreas Rohner wrote:
> On 2015-05-09 14:17, Ryusuke Konishi wrote:
>> On Sun,  3 May 2015 12:05:20 +0200, Andreas Rohner wrote:
[...]
>> 
>> Uum. This still looks to have potential for leak of dirty block
>> collection between DAT and SUFILE since this retry is limited by
>> the fixed retry count.
>> 
>> How about adding function temporarily turning off the live block
>> tracking and using it after this propagation loop until log write
>> finishes ?
>> 
>> It would reduce the accuracy of live block count, but is it enough ?
>> How do you think ?  We have to eliminate the possibility of the leak
>> because it can cause file system corruption.  Every checkpoint must be
>> self-contained.
> 
> How exactly could it lead to file system corruption? Maybe I miss
> something important here, but it seems to me, that no corruption is
> possible.
> 
> The nilfs_sufile_flush_cache_node() function only reads in already
> existing blocks. No new blocks are created. If I mark those blocks
> dirty, the btree is not changed at all. If I do not call
> nilfs_bmap_propagate(), then the btree stays unchanged and there are no
> dangling pointers. The resulting checkpoint should be self-contained.

Good point.  As for btree, it looks like no inconsistency issue arises
since nilfs_sufile_flush_cache_node() never inserts new blocks as you
pointed out.  Even though we also must care inconsistency between
sufile header and sufile data blocks, and block count in inode as
well, fortunately these look to be ok, too.

However, I still think it's not good to carry over dirty blocks to the
next segment construction to avoid extra checkpoint creation and to
simplify things.

>From this viewpoint, I also prefer that nilfs_sufile_flush_cache() and
nilfs_sufile_flush_cache_node() are changed a bit so that they will
skip adjusting su_nlive_blks and su_nlive_lastmod if the sufile block
that includes the segment usage is not marked dirty and only_mark == 0
as well as turing off live block counting temporarily after the
sufile/DAT propagation loop.

> 
> The only problem would be, that I could lose some nlive_blks updates.
> 

Regards,
Ryusuke Konishi
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 7/9] nilfs2: ensure that all dirty blocks are written out

2015-05-09 Thread Andreas Rohner
On 2015-05-09 14:17, Ryusuke Konishi wrote:
> On Sun,  3 May 2015 12:05:20 +0200, Andreas Rohner wrote:
>> This patch ensures, that all dirty blocks are written out if the segment
>> construction mode is SC_LSEG_SR. The scanning of the DAT file can cause
>> blocks in the SUFILE to be dirtied and newly dirtied blocks in the
>> SUFILE can in turn dirty more blocks in the DAT file. Since one of
>> these stages has to happen before the other during segment
>> construction, we end up with unwritten dirty blocks, that are lost
>> in case of a file system unmount.
>>
>> This patch introduces a new set of file scanning operations that
>> only propagate the changes to the bmap and do not add anything to the
>> segment buffer. The DAT file and SUFILE are scanned with these
>> operations. The function nilfs_sufile_flush_cache() is called in between
>> these scans with the parameter only_mark set. That way it can be called
>> repeatedly without actually writing anything to the SUFILE. If there are
>> no new blocks dirtied in the flush, the normal segment construction
>> stages can safely continue.
>>
>> Signed-off-by: Andreas Rohner 
>> ---
>>  fs/nilfs2/segment.c | 73 
>> -
>>  fs/nilfs2/segment.h |  3 ++-
>>  2 files changed, 74 insertions(+), 2 deletions(-)
>>
>> diff --git a/fs/nilfs2/segment.c b/fs/nilfs2/segment.c
>> index 14e76c3..ab8df33 100644
>> --- a/fs/nilfs2/segment.c
>> +++ b/fs/nilfs2/segment.c
>> @@ -579,6 +579,12 @@ static int nilfs_collect_dat_data(struct nilfs_sc_info 
>> *sci,
>>  return err;
>>  }
>>  
>> +static int nilfs_collect_prop_data(struct nilfs_sc_info *sci,
>> +  struct buffer_head *bh, struct inode *inode)
>> +{
>> +return nilfs_bmap_propagate(NILFS_I(inode)->i_bmap, bh);
>> +}
>> +
>>  static int nilfs_collect_dat_bmap(struct nilfs_sc_info *sci,
>>struct buffer_head *bh, struct inode *inode)
>>  {
>> @@ -613,6 +619,14 @@ static struct nilfs_sc_operations nilfs_sc_dat_ops = {
>>  .write_node_binfo = nilfs_write_dat_node_binfo,
>>  };
>>  
>> +static struct nilfs_sc_operations nilfs_sc_prop_ops = {
>> +.collect_data = nilfs_collect_prop_data,
>> +.collect_node = nilfs_collect_file_node,
>> +.collect_bmap = NULL,
>> +.write_data_binfo = NULL,
>> +.write_node_binfo = NULL,
>> +};
>> +
>>  static struct nilfs_sc_operations nilfs_sc_dsync_ops = {
>>  .collect_data = nilfs_collect_file_data,
>>  .collect_node = NULL,
>> @@ -998,7 +1012,8 @@ static int nilfs_segctor_scan_file(struct nilfs_sc_info 
>> *sci,
>>  err = nilfs_segctor_apply_buffers(
>>  sci, inode, &data_buffers,
>>  sc_ops->collect_data);
>> -BUG_ON(!err); /* always receive -E2BIG or true error */
>> +/* always receive -E2BIG or true error (NOT ANYMORE?)*/
>> +/* BUG_ON(!err); */
>>  goto break_or_fail;
>>  }
>>  }
> 
> If n > rest, this function will exit without scanning node buffers
> for nilfs_segctor_propagate_sufile().  This looks problem, right?
> 
> I think adding separate functions is better.  For instance,
> 
> static int nilfs_propagate_buffer(struct nilfs_sc_info *sci,
> struct buffer_head *bh,
> struct inode *inode)
> {
>   return nilfs_bmap_propagate(NILFS_I(inode)->i_bmap, bh);
> }
> 
> static int nilfs_segctor_propagate_file(struct nilfs_sc_info *sci,
>   struct inode *inode)
> {
>   LIST_HEAD(buffers);
>   size_t n;
>   int err;
> 
>   n = nilfs_lookup_dirty_data_buffers(inode, &buffers, SIZE_MAX, 0,
>   LLONG_MAX);
>   if (n > 0) {
>   ret = nilfs_segctor_apply_buffers(sci, inode, &buffers,
> nilfs_propagate_buffer);
>   if (unlikely(ret))
>   goto fail;
>   }
> 
>   nilfs_lookup_dirty_node_buffers(inode, &buffers);
>   ret = nilfs_segctor_apply_buffers(sci, inode, &buffers,
> nilfs_propagate_buffer);
> fail:
>   return ret;
> }
> 
> With this, you can also avoid defining nilfs_sc_prop_ops, nor touching
> the BUG_ON() in nilfs_segctor_scan_file.

I agree this is a much nicer solution.

>> @@ -1055,6 +1070,55 @@ static int nilfs_segctor_scan_file_dsync(struct 
>> nilfs_sc_info *sci,
>>  return err;
>>  }
>>  
>> +/**
>> + * nilfs_segctor_propagate_sufile - dirties all needed SUFILE blocks
>> + * @sci: nilfs_sc_info
>> + *
>> + * Description: Dirties and propagates all SUFILE blocks that need to be
>> + * available later in the segment construction process, when the SUFILE 
>> cache
>> + * is flushed. Here the SUFILE cache is not actually flushed, but the blocks
>> + * that are needed fo