Re: [PATCH 0/9] nilfs2: implementation of cost-benefit GC policy

2015-03-14 Thread Andreas Rohner
Hi Ryusuke,

Thank you very much for your detailed review and feedback. I agree with
all of your points and I will start working on a rewrite immediately.

On 2015-03-12 13:54, Ryusuke Konishi wrote:
 Hi Andreas,
 
 On Tue, 10 Mar 2015 21:37:50 +0100, Andreas Rohner wrote:
 Hi Ryusuke,

 Thanks for your thorough review.

 On 2015-03-10 06:21, Ryusuke Konishi wrote:
 Hi Andreas,

 I looked through whole kernel patches and a part of util patches.
 Overall comments are as follows:

 [Algorithm]
 As for algorithm, it looks about OK except for the starvation
 countermeasure.  The stavation countermeasure looks adhoc/hacky, but
 it's good that it doesn't change kernel/userland interface; we may be
 able to replace it with better ways in a future or in a revised
 version of this patchset.

 (1) Drawback of the starvation countermeasure
 The patch 9/9 looks to make the execution time of chcp operation
 worse since it will scan through sufile to modify live block
 counters.  How much does it prolong the execution time ?

 I'll do some tests, but I haven't noticed any significant performance
 drop. The GC basically does the same thing, every time it selects
 segments to reclaim.
 
 GC is performed in background by an independent process.  What I'm
 care about it that NILFS_IOCTL_CHANGE_CPMODE ioctl is called from
 command line interface or application.  They differ in this meaning.
 
 Was a worse case senario considered in the test ?
 
 For example:
 1. Fill a TB class drive with data file(s), and make a snapshot on it.
 2. Run one pass GC to update snapshot block counts.
 3. And do chcp cp
 
 If we don't observe noticeable delay on this class of drive, then I
 think we can put the problem off.

Yesterday I did a worst case test as you suggested. I used an old 1 TB
hard drive I had lying around. This was my setup:

1. Write a 850GB file
2. Create a snapshot
3. Delete the file
4. Let GC run through all segments
5. Verify with lssu that the GC has updated all SUFILE entries
6. Drop the page cache
7. chcp cp

The following results are with the page cache dropped immediately before
each call:

1. chcp ss
real0m1.337s
user0m0.017s
sys 0m0.030s

2. chcp cp
real0m6.377s
user0m0.023s
sys 0m0.053s

The following results are without the drop of the page cache:

1. chcp ss
real0m0.137s
user0m0.010s
sys 0m0.000s

2. chcp cp
real0m0.016s
user0m0.010s
sys 0m0.007s

There are 119233 segments in my test. Each SUFILE entry uses 32 bytes.
So the worst case for 1 TB with 8 MB segments would be 3.57 MB of random
reads and one 3.57 MB continuous write. You only get 6.377s because my
hard drive is so slow. You wouldn't notice any difference on a modern
SSD. Furthermore the SUFILE is also scanned by the segment allocation
algorithm and the GC, so it is very likely already in the page cache.

 In a use case of nilfs, many snapshots are created and they are
 automatically changed back to plain checkpoints because old
 snapshots are thinned out over time.  The patch 9/9 may impact on
 such usage.

 (2) Compatibility
 What will happen in the following case:
 1. Create a file system, use it with the new module, and
create snapshots.
 2. Mount it with an old module, and release snapshot with chcp cp
 3. Mount it with the new module, and cleanerd runs gc with
cost benefit or greedy policy.

 Some segments could be subject to starvation. But it would probably only
 affect a small number of segments and it could be fixed by chcp ss
 CP; chcp cp CP.
 
 Ok, let's treat this as a restriction for now.
 If you come up with any good idea, please propose.
 
 (3) Durability against unexpected power failures (just a note)
 The current patchset looks not to cause starvation issue even when
 unexpected power failure occurs during or after executing chcp
 cp because nilfs_ioctl_change_cpmode() do changes in a
 transactional way with nilfs_transaction_begin/commit.
 We should always think this kind of situtation to keep consistency.

 [Coding Style]
 (4) This patchset has several coding style issues. Please fix them and
 re-check with the latest checkpatch script (script/checkpatch.pl).

 I'll fix that. Sorry.

 patch 2:
 WARNING: Prefer kmalloc_array over kmalloc with multiply
 #85: FILE: fs/nilfs2/sufile.c:1192:
 +mc-mc_mods = kmalloc(capacity * sizeof(struct nilfs_sufile_mod),

 patch 5,6:
 WARNING: 'aquired' may be misspelled - perhaps 'acquired'?
 #60: 
 the same semaphore has to be aquired. So if the DAT-Entry belongs to

 WARNING: 'aquired' may be misspelled - perhaps 'acquired'?
 #46: 
 be aquired, which blocks the entire SUFILE and effectively turns

 WARNING: 'aquired' may be misspelled - perhaps 'acquired'?
 #53: 
 afore mentioned lock only needs to be aquired, if the cache is full

 (5) sub_sizeof macro:
 The same definition exists as offsetofend() in vfio.h,
 and a patch to move it to stddef.h is now 

Re: [PATCH 0/9] nilfs2: implementation of cost-benefit GC policy

2015-03-14 Thread Ryusuke Konishi
On Sat, 14 Mar 2015 13:24:25 +0100, Andreas Rohner wrote:
 Hi Ryusuke,
 
 Thank you very much for your detailed review and feedback. I agree with
 all of your points and I will start working on a rewrite immediately.
 
 On 2015-03-12 13:54, Ryusuke Konishi wrote:
 Hi Andreas,
 
 On Tue, 10 Mar 2015 21:37:50 +0100, Andreas Rohner wrote:
 Hi Ryusuke,

 Thanks for your thorough review.

 On 2015-03-10 06:21, Ryusuke Konishi wrote:
 Hi Andreas,

 I looked through whole kernel patches and a part of util patches.
 Overall comments are as follows:

 [Algorithm]
 As for algorithm, it looks about OK except for the starvation
 countermeasure.  The stavation countermeasure looks adhoc/hacky, but
 it's good that it doesn't change kernel/userland interface; we may be
 able to replace it with better ways in a future or in a revised
 version of this patchset.

 (1) Drawback of the starvation countermeasure
 The patch 9/9 looks to make the execution time of chcp operation
 worse since it will scan through sufile to modify live block
 counters.  How much does it prolong the execution time ?

 I'll do some tests, but I haven't noticed any significant performance
 drop. The GC basically does the same thing, every time it selects
 segments to reclaim.
 
 GC is performed in background by an independent process.  What I'm
 care about it that NILFS_IOCTL_CHANGE_CPMODE ioctl is called from
 command line interface or application.  They differ in this meaning.
 
 Was a worse case senario considered in the test ?
 
 For example:
 1. Fill a TB class drive with data file(s), and make a snapshot on it.
 2. Run one pass GC to update snapshot block counts.
 3. And do chcp cp
 
 If we don't observe noticeable delay on this class of drive, then I
 think we can put the problem off.
 
 Yesterday I did a worst case test as you suggested. I used an old 1 TB
 hard drive I had lying around. This was my setup:
 
 1. Write a 850GB file
 2. Create a snapshot
 3. Delete the file
 4. Let GC run through all segments
 5. Verify with lssu that the GC has updated all SUFILE entries
 6. Drop the page cache
 7. chcp cp
 
 The following results are with the page cache dropped immediately before
 each call:
 
 1. chcp ss
 real  0m1.337s
 user  0m0.017s
 sys   0m0.030s
 
 2. chcp cp
 real  0m6.377s
 user  0m0.023s
 sys   0m0.053s
 
 The following results are without the drop of the page cache:
 
 1. chcp ss
 real  0m0.137s
 user  0m0.010s
 sys   0m0.000s
 
 2. chcp cp
 real  0m0.016s
 user  0m0.010s
 sys   0m0.007s
 
 There are 119233 segments in my test. Each SUFILE entry uses 32 bytes.
 So the worst case for 1 TB with 8 MB segments would be 3.57 MB of random
 reads and one 3.57 MB continuous write. You only get 6.377s because my
 hard drive is so slow. You wouldn't notice any difference on a modern
 SSD. Furthermore the SUFILE is also scanned by the segment allocation
 algorithm and the GC, so it is very likely already in the page cache.

6.377s is too long because nilfs_sufile_fix_starving_segs() locks
sufile mi_sem, and even lengthens lock period of the following locks:

 - cpfile mi_sem (held at nilfs_cpfile_clear_snapshot()).
 - transaction lock (held at nilfs_ioctl_change_cpmode()).
 - ns_snapshot_mount_mutex (held at nilfs_ioctl_change_cpmode()).

leading to freeze of all write operations, lssu, lscp, cleanerd, and
snapshot mount, etc.

It is preferable for the function to be moved outside of them and to
release/reacquire transaction lock and sufile mi_sem regularly in some
way.

Regards,
Ryusuke Konishi
--
To unsubscribe from this list: send the line unsubscribe linux-nilfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/9] nilfs2: implementation of cost-benefit GC policy

2015-03-12 Thread Ryusuke Konishi
Hi Andreas,

On Tue, 10 Mar 2015 21:37:50 +0100, Andreas Rohner wrote:
 Hi Ryusuke,
 
 Thanks for your thorough review.
 
 On 2015-03-10 06:21, Ryusuke Konishi wrote:
 Hi Andreas,
 
 I looked through whole kernel patches and a part of util patches.
 Overall comments are as follows:
 
 [Algorithm]
 As for algorithm, it looks about OK except for the starvation
 countermeasure.  The stavation countermeasure looks adhoc/hacky, but
 it's good that it doesn't change kernel/userland interface; we may be
 able to replace it with better ways in a future or in a revised
 version of this patchset.
 
 (1) Drawback of the starvation countermeasure
 The patch 9/9 looks to make the execution time of chcp operation
 worse since it will scan through sufile to modify live block
 counters.  How much does it prolong the execution time ?
 
 I'll do some tests, but I haven't noticed any significant performance
 drop. The GC basically does the same thing, every time it selects
 segments to reclaim.

GC is performed in background by an independent process.  What I'm
care about it that NILFS_IOCTL_CHANGE_CPMODE ioctl is called from
command line interface or application.  They differ in this meaning.

Was a worse case senario considered in the test ?

For example:
1. Fill a TB class drive with data file(s), and make a snapshot on it.
2. Run one pass GC to update snapshot block counts.
3. And do chcp cp

If we don't observe noticeable delay on this class of drive, then I
think we can put the problem off.

 In a use case of nilfs, many snapshots are created and they are
 automatically changed back to plain checkpoints because old
 snapshots are thinned out over time.  The patch 9/9 may impact on
 such usage.

 (2) Compatibility
 What will happen in the following case:
 1. Create a file system, use it with the new module, and
create snapshots.
 2. Mount it with an old module, and release snapshot with chcp cp
 3. Mount it with the new module, and cleanerd runs gc with
cost benefit or greedy policy.
 
 Some segments could be subject to starvation. But it would probably only
 affect a small number of segments and it could be fixed by chcp ss
 CP; chcp cp CP.

Ok, let's treat this as a restriction for now.
If you come up with any good idea, please propose.

 (3) Durability against unexpected power failures (just a note)
 The current patchset looks not to cause starvation issue even when
 unexpected power failure occurs during or after executing chcp
 cp because nilfs_ioctl_change_cpmode() do changes in a
 transactional way with nilfs_transaction_begin/commit.
 We should always think this kind of situtation to keep consistency.
 
 [Coding Style]
 (4) This patchset has several coding style issues. Please fix them and
 re-check with the latest checkpatch script (script/checkpatch.pl).
 
 I'll fix that. Sorry.
 
 patch 2:
 WARNING: Prefer kmalloc_array over kmalloc with multiply
 #85: FILE: fs/nilfs2/sufile.c:1192:
 +mc-mc_mods = kmalloc(capacity * sizeof(struct nilfs_sufile_mod),
 
 patch 5,6:
 WARNING: 'aquired' may be misspelled - perhaps 'acquired'?
 #60: 
 the same semaphore has to be aquired. So if the DAT-Entry belongs to
 
 WARNING: 'aquired' may be misspelled - perhaps 'acquired'?
 #46: 
 be aquired, which blocks the entire SUFILE and effectively turns
 
 WARNING: 'aquired' may be misspelled - perhaps 'acquired'?
 #53: 
 afore mentioned lock only needs to be aquired, if the cache is full
 
 (5) sub_sizeof macro:
 The same definition exists as offsetofend() in vfio.h,
 and a patch to move it to stddef.h is now proposed.
 
 Please use the same name, and redefine it only if it's not
 defined:
 
 #ifndef offsetofend
 #define offsetofend(TYPE, MEMBER) \
 (offsetof(TYPE, MEMBER) + sizeof(((TYPE *)0)-MEMBER))
 #endif
 
 Ok I'll change that.
 
 [Implementation]
 (6) b_blocknr
 Please do not use bh-b_blocknr to store disk block number.  This
 field is used to keep virtual block number except for DAT files.
 It is only replaced to an actual block number during calling
 submit_bh().  Keep this policy.
 
 As far as I can tell, this is only true for blocks of GC inodes and node
 blocks. All other buffer_heads are always mapped to on disk blocks by
 nilfs_get_block(). I only added the mapping in nilfs_segbuf_submit_bh()
 to correctly set the value in b_blocknr to the new location.
 

nilfs_get_block() is only used for regular files, directories, and so
on.  Blocks on metadata files are mapped through
nilfs_mdt_submit_block().  Anyway, yes, they stores actual disk block
number in b_blocknr in the current implementation.  But, it is just a
cutting corner of the current implementation, which comes from the
reason that we have to set actual disk block numbers when reading
blocks with vfs/mm functions.

Anyway I don't like you touch nilfs_get_block() and
nilfs_segbuf_submit_bh() in a part of the big patch.  At least, it