Re: [PATCH v2 9/9] nilfs2: prevent starvation of segments protected by snapshots

2015-05-20 Thread Ryusuke Konishi
On Wed, 20 May 2015 23:43:35 +0900 (JST), Ryusuke Konishi wrote:
 On Sun,  3 May 2015 12:05:22 +0200, Andreas Rohner wrote:
 It doesn't really matter if the number of reclaimable blocks for a
 segment is inaccurate, as long as the overall performance is better than
 the simple timestamp algorithm and starvation is prevented.
 
 The following steps will lead to starvation of a segment:
 
 1. The segment is written
 2. A snapshot is created
 3. The files in the segment are deleted and the number of live
blocks for the segment is decremented to a very low value
 4. The GC tries to free the segment, but there are no reclaimable
blocks, because they are all protected by the snapshot. To prevent an
infinite loop the GC has to adjust the number of live blocks to the
correct value.
 5. The snapshot is converted to a checkpoint and the blocks in the
segment are now reclaimable.
 6. The GC will never attempt to clean the segment again, because it
looks as if it had a high number of live blocks.
 
 To prevent this, the already existing padding field of the SUFILE entry
 is used to track the number of snapshot blocks in the segment. This
 number is only set by the GC, since it collects the necessary
 information anyway. So there is no need, to track which block belongs to
 which segment. In step 4 of the list above the GC will set the new field
 su_nsnapshot_blks. In step 5 all entries in the SUFILE are checked and
 entries with a big su_nsnapshot_blks field get their su_nlive_blks field
 reduced.
 
 Signed-off-by: Andreas Rohner andreas.roh...@gmx.net
 
 I still don't know whether this workaround is the way we should take
 or not.  This patch has several drawbacks:
 
  1. It introduces overheads to every chcp cp operation
 due to traversal rewrite of sufile.
 If the ratio of snapshot protected blocks is high, then
 this overheads will be big.
 
  2. The traversal rewrite of sufile will causes many sufile blocks will be
 written out.   If most blocks are protected by a snapshot,
 more than 4MB of sufile blocks will be written per 1TB capacity.
 
 Even though this rewrite may not happen for contiguous chcp cp
 operations, it still has potential for creating sufile write blocks
 if the application of nilfs manipulates snapshots frequently.
 
  3. The ratio of the threshold max_segblks is hard coded to 50%
 of blocks_per_segment.  It is not clear if the ratio is good
 (versatile).
 
 I will add comments inline below.
 
 ---
  fs/nilfs2/ioctl.c  | 50 +++-
  fs/nilfs2/sufile.c | 85 
 ++
  fs/nilfs2/sufile.h |  3 ++
  3 files changed, 137 insertions(+), 1 deletion(-)
 
 diff --git a/fs/nilfs2/ioctl.c b/fs/nilfs2/ioctl.c
 index 40bf74a..431725f 100644
 --- a/fs/nilfs2/ioctl.c
 +++ b/fs/nilfs2/ioctl.c
 @@ -200,6 +200,49 @@ static int nilfs_ioctl_getversion(struct inode *inode, 
 void __user *argp)
  }
  
  /**
 + * nilfs_ioctl_fix_starving_segs - fix potentially starving segments
 + * @nilfs: nilfs object
 + * @inode: inode object
 + *
 + * Description: Scans for segments, which are potentially starving and
 + * reduces the number of live blocks to less than half of the maximum
 + * number of blocks in a segment. This requires a scan of the whole SUFILE,
 + * which can take a long time on certain devices and under certain 
 conditions.
 + * To avoid blocking other file system operations for too long the SUFILE is
 + * scanned in steps of NILFS_SUFILE_STARVING_SEGS_STEP. After each step the
 + * locks are released and cond_resched() is called.
 + *
 + * Return Value: On success, 0 is returned and on error, one of the
 + * following negative error codes is returned.
 + *
 + * %-EIO - I/O error.
 + *
 + * %-ENOMEM - Insufficient amount of memory available.
 + */
 
 +static int nilfs_ioctl_fix_starving_segs(struct the_nilfs *nilfs,
 + struct inode *inode) {
 
 This inode argument is meaningless for this routine.
 Consider passing sb instead.
 
 I feel odd for the function name fix starving segs.  It looks to
 give a workaround rather than solve the root problem of gc in nilfs.
 It looks like what this patch is doing, is calibrating live block
 count.
 
 +struct nilfs_transaction_info ti;
 
 +unsigned long i, nsegs = nilfs_sufile_get_nsegments(nilfs-ns_sufile);
 
 nsegs is set outside the transaction lock.
 
 Since the file system can be resized (both shrinked or extended)
 outside the lock, nsegs must be initialized or updated in the
 section where the tranaction lock is held.
 
 +int ret = 0;
 +
 +for (i = 0; i  nsegs; i += NILFS_SUFILE_STARVING_SEGS_STEP) {
 +nilfs_transaction_begin(inode-i_sb, ti, 0);
 +
 +ret = nilfs_sufile_fix_starving_segs(nilfs-ns_sufile, i,
 +NILFS_SUFILE_STARVING_SEGS_STEP);
 +if (unlikely(ret  0)) {
 +

Re: [PATCH v2 9/9] nilfs2: prevent starvation of segments protected by snapshots

2015-05-20 Thread Ryusuke Konishi
On Sun,  3 May 2015 12:05:22 +0200, Andreas Rohner wrote:
 It doesn't really matter if the number of reclaimable blocks for a
 segment is inaccurate, as long as the overall performance is better than
 the simple timestamp algorithm and starvation is prevented.
 
 The following steps will lead to starvation of a segment:
 
 1. The segment is written
 2. A snapshot is created
 3. The files in the segment are deleted and the number of live
blocks for the segment is decremented to a very low value
 4. The GC tries to free the segment, but there are no reclaimable
blocks, because they are all protected by the snapshot. To prevent an
infinite loop the GC has to adjust the number of live blocks to the
correct value.
 5. The snapshot is converted to a checkpoint and the blocks in the
segment are now reclaimable.
 6. The GC will never attempt to clean the segment again, because it
looks as if it had a high number of live blocks.
 
 To prevent this, the already existing padding field of the SUFILE entry
 is used to track the number of snapshot blocks in the segment. This
 number is only set by the GC, since it collects the necessary
 information anyway. So there is no need, to track which block belongs to
 which segment. In step 4 of the list above the GC will set the new field
 su_nsnapshot_blks. In step 5 all entries in the SUFILE are checked and
 entries with a big su_nsnapshot_blks field get their su_nlive_blks field
 reduced.
 
 Signed-off-by: Andreas Rohner andreas.roh...@gmx.net

I still don't know whether this workaround is the way we should take
or not.  This patch has several drawbacks:

 1. It introduces overheads to every chcp cp operation
due to traversal rewrite of sufile.
If the ratio of snapshot protected blocks is high, then
this overheads will be big.

 2. The traversal rewrite of sufile will causes many sufile blocks will be
written out.   If most blocks are protected by a snapshot,
more than 4MB of sufile blocks will be written per 1TB capacity.

Even though this rewrite may not happen for contiguous chcp cp
operations, it still has potential for creating sufile write blocks
if the application of nilfs manipulates snapshots frequently.

 3. The ratio of the threshold max_segblks is hard coded to 50%
of blocks_per_segment.  It is not clear if the ratio is good
(versatile).

I will add comments inline below.

 ---
  fs/nilfs2/ioctl.c  | 50 +++-
  fs/nilfs2/sufile.c | 85 
 ++
  fs/nilfs2/sufile.h |  3 ++
  3 files changed, 137 insertions(+), 1 deletion(-)
 
 diff --git a/fs/nilfs2/ioctl.c b/fs/nilfs2/ioctl.c
 index 40bf74a..431725f 100644
 --- a/fs/nilfs2/ioctl.c
 +++ b/fs/nilfs2/ioctl.c
 @@ -200,6 +200,49 @@ static int nilfs_ioctl_getversion(struct inode *inode, 
 void __user *argp)
  }
  
  /**
 + * nilfs_ioctl_fix_starving_segs - fix potentially starving segments
 + * @nilfs: nilfs object
 + * @inode: inode object
 + *
 + * Description: Scans for segments, which are potentially starving and
 + * reduces the number of live blocks to less than half of the maximum
 + * number of blocks in a segment. This requires a scan of the whole SUFILE,
 + * which can take a long time on certain devices and under certain 
 conditions.
 + * To avoid blocking other file system operations for too long the SUFILE is
 + * scanned in steps of NILFS_SUFILE_STARVING_SEGS_STEP. After each step the
 + * locks are released and cond_resched() is called.
 + *
 + * Return Value: On success, 0 is returned and on error, one of the
 + * following negative error codes is returned.
 + *
 + * %-EIO - I/O error.
 + *
 + * %-ENOMEM - Insufficient amount of memory available.
 + */

 +static int nilfs_ioctl_fix_starving_segs(struct the_nilfs *nilfs,
 +  struct inode *inode) {

This inode argument is meaningless for this routine.
Consider passing sb instead.

I feel odd for the function name fix starving segs.  It looks to
give a workaround rather than solve the root problem of gc in nilfs.
It looks like what this patch is doing, is calibrating live block
count.

 + struct nilfs_transaction_info ti;

 + unsigned long i, nsegs = nilfs_sufile_get_nsegments(nilfs-ns_sufile);

nsegs is set outside the transaction lock.

Since the file system can be resized (both shrinked or extended)
outside the lock, nsegs must be initialized or updated in the
section where the tranaction lock is held.

 + int ret = 0;
 +
 + for (i = 0; i  nsegs; i += NILFS_SUFILE_STARVING_SEGS_STEP) {
 + nilfs_transaction_begin(inode-i_sb, ti, 0);
 +
 + ret = nilfs_sufile_fix_starving_segs(nilfs-ns_sufile, i,
 + NILFS_SUFILE_STARVING_SEGS_STEP);
 + if (unlikely(ret  0)) {
 + nilfs_transaction_abort(inode-i_sb);
 + break;
 + }
 +
 +