Re: [PATCH 1/9] nilfs2: refactor nilfs_sufile_updatev()

2015-03-12 Thread Ryusuke Konishi
On Tue, 24 Feb 2015 20:01:36 +0100, Andreas Rohner wrote:
> This patch refactors nilfs_sufile_updatev() to take an array of
> arbitrary data structures instead of an array of segment numbers as
> input parameter. With this  change it is reusable for cases, where
> it is necessary to pass extra data to the update function. The only
> requirement for the data structures passed as input is, that they
> contain the segment number within the structure. By passing the
> offset to the segment number as another input parameter,
> nilfs_sufile_updatev() can be oblivious to the actual type of the
> input structures in the array.
> 
> Signed-off-by: Andreas Rohner 
> ---
>  fs/nilfs2/sufile.c | 79 
> --
>  fs/nilfs2/sufile.h | 39 ++-
>  2 files changed, 68 insertions(+), 50 deletions(-)
> 
> diff --git a/fs/nilfs2/sufile.c b/fs/nilfs2/sufile.c
> index 2a869c3..1e8cac6 100644
> --- a/fs/nilfs2/sufile.c
> +++ b/fs/nilfs2/sufile.c
> @@ -138,14 +138,18 @@ unsigned long nilfs_sufile_get_ncleansegs(struct inode 
> *sufile)
>  /**
>   * nilfs_sufile_updatev - modify multiple segment usages at a time
>   * @sufile: inode of segment usage file
> - * @segnumv: array of segment numbers
> - * @nsegs: size of @segnumv array
> + * @datav: array of segment numbers
> + * @datasz: size of elements in @datav
> + * @segoff: offset to segnum within the elements of @datav
> + * @ndata: size of @datav array
>   * @create: creation flag
>   * @ndone: place to store number of modified segments on @segnumv
>   * @dofunc: primitive operation for the update
>   *
>   * Description: nilfs_sufile_updatev() repeatedly calls @dofunc
> - * against the given array of segments.  The @dofunc is called with
> + * against the given array of data elements. Every data element has
> + * to contain a valid segment number and @segoff should be the offset
> + * to that within the data structure. The @dofunc is called with
>   * buffers of a header block and the sufile block in which the target
>   * segment usage entry is contained.  If @ndone is given, the number
>   * of successfully modified segments from the head is stored in the
> @@ -163,50 +167,55 @@ unsigned long nilfs_sufile_get_ncleansegs(struct inode 
> *sufile)
>   *
>   * %-EINVAL - Invalid segment usage number
>   */
> -int nilfs_sufile_updatev(struct inode *sufile, __u64 *segnumv, size_t nsegs,
> -  int create, size_t *ndone,
> -  void (*dofunc)(struct inode *, __u64,
> +int nilfs_sufile_updatev(struct inode *sufile, void *datav, size_t datasz,
> +  size_t segoff, size_t ndata, int create,
> +  size_t *ndone,
> +  void (*dofunc)(struct inode *, void *,
>   struct buffer_head *,
>   struct buffer_head *))

Using offset byte of the data like segoff is nasty.

Please consider defining a template structure and its variation:

struct nilfs_sufile_update_data {
   __u64 segnum;
   /* Optional data comes after segnum */
};

/**
 * struct nilfs_sufile_update_count - data type of nilfs_sufile_do_xxx
 * @segnum: segment number
 * @nadd: additional value to a counter
 * Description: This structure derives from nilfs_sufile_update_data
 * struct.
 */
struct nilfs_sufile_update_count {
   __u64 segnum;
   __u64 nadd;
};

int nilfs_sufile_updatev(struct inode *sufile,
 struct nilfs_sufile_update_data *datav,
 size_t datasz,
 size_t ndata, int create, size_t *ndone,
 void (*dofunc)(struct inode *,
struct nilfs_sufile_update_data *,
struct buffer_head *,
struct buffer_head *))
{
...
}

If you need define segnum in the middle of structure, you can use
container_of():

Example:

struct nilfs_sufile_update_xxx {
   __u32 item_a;
   __u32 item_b;
   struct nilfs_sufile_update_data u_data;
};

static inline struct nilfs_sufile_update_xxx *
NILFS_SU_UPDATE_XXX(struct nilfs_sufile_update_data *data)
{
return container_of(data, struct nilfs_sufile_update_xxx, u_data);
}

void nilfs_sufile_do_xxx(...)
{
   struct nilfs_sufile_update_xxx *xxx;

   xxx = NILFS_SU_UPDATA_XXX(data);
   ...
}

I believe the former technique is enough in your case. (you can
suppose that segnum is always the first member of data, right ?).


>  {
>   struct buffer_head *header_bh, *bh;
>   unsigned long blkoff, prev_blkoff;
>   __u64 *seg;
> - size_t nerr = 0, n = 0;
> + void *data, *dataend = datav + ndata * datasz;
> + size_t n = 0;
>   int ret = 0;
>  
> - if (unlikely(nsegs == 0))
> + if (unlikely(ndata == 0))
>   goto out;
>  
> - down_write(&NILFS_MDT(sufile)->mi_sem);
> - for (seg =

Re: [PATCH 0/9] nilfs2: implementation of cost-benefit GC policy

2015-03-12 Thread Andreas Rohner
Hi Ryusuke,

Thanks for your thorough review.

On 2015-03-10 06:21, Ryusuke Konishi wrote:
> Hi Andreas,
> 
> I looked through whole kernel patches and a part of util patches.
> Overall comments are as follows:
> 
> [Algorithm]
> As for algorithm, it looks about OK except for the starvation
> countermeasure.  The stavation countermeasure looks adhoc/hacky, but
> it's good that it doesn't change kernel/userland interface; we may be
> able to replace it with better ways in a future or in a revised
> version of this patchset.
> 
> (1) Drawback of the starvation countermeasure
> The patch 9/9 looks to make the execution time of chcp operation
> worse since it will scan through sufile to modify live block
> counters.  How much does it prolong the execution time ?

I'll do some tests, but I haven't noticed any significant performance
drop. The GC basically does the same thing, every time it selects
segments to reclaim.

> In a use case of nilfs, many snapshots are created and they are
> automatically changed back to plain checkpoints because old
> snapshots are thinned out over time.  The patch 9/9 may impact on
> such usage.
>
> (2) Compatibility
> What will happen in the following case:
> 1. Create a file system, use it with the new module, and
>create snapshots.
> 2. Mount it with an old module, and release snapshot with "chcp cp"
> 3. Mount it with the new module, and cleanerd runs gc with
>cost benefit or greedy policy.

Some segments could be subject to starvation. But it would probably only
affect a small number of segments and it could be fixed by "chcp ss
; chcp cp ".

> (3) Durability against unexpected power failures (just a note)
> The current patchset looks not to cause starvation issue even when
> unexpected power failure occurs during or after executing "chcp
> cp" because nilfs_ioctl_change_cpmode() do changes in a
> transactional way with nilfs_transaction_begin/commit.
> We should always think this kind of situtation to keep consistency.
> 
> [Coding Style]
> (4) This patchset has several coding style issues. Please fix them and
> re-check with the latest checkpatch script (script/checkpatch.pl).

I'll fix that. Sorry.

> patch 2:
> WARNING: Prefer kmalloc_array over kmalloc with multiply
> #85: FILE: fs/nilfs2/sufile.c:1192:
> +mc->mc_mods = kmalloc(capacity * sizeof(struct nilfs_sufile_mod),
> 
> patch 5,6:
> WARNING: 'aquired' may be misspelled - perhaps 'acquired'?
> #60: 
> the same semaphore has to be aquired. So if the DAT-Entry belongs to
> 
> WARNING: 'aquired' may be misspelled - perhaps 'acquired'?
> #46: 
> be aquired, which blocks the entire SUFILE and effectively turns
> 
> WARNING: 'aquired' may be misspelled - perhaps 'acquired'?
> #53: 
> afore mentioned lock only needs to be aquired, if the cache is full
> 
> (5) sub_sizeof macro:
> The same definition exists as offsetofend() in vfio.h,
> and a patch to move it to stddef.h is now proposed.
> 
> Please use the same name, and redefine it only if it's not
> defined:
> 
> #ifndef offsetofend
> #define offsetofend(TYPE, MEMBER) \
> (offsetof(TYPE, MEMBER) + sizeof(((TYPE *)0)->MEMBER))
> #endif

Ok I'll change that.

> [Implementation]
> (6) b_blocknr
> Please do not use bh->b_blocknr to store disk block number.  This
> field is used to keep virtual block number except for DAT files.
> It is only replaced to an actual block number during calling
> submit_bh().  Keep this policy.

As far as I can tell, this is only true for blocks of GC inodes and node
blocks. All other buffer_heads are always mapped to on disk blocks by
nilfs_get_block(). I only added the mapping in nilfs_segbuf_submit_bh()
to correctly set the value in b_blocknr to the new location.

> In segment constructor context, you can calculate the disk block
> number from the start disk address of the segment and the block
> index (offset) in the segment.

If I understand you correctly, this approach would give me the on disk
location inside of the segment that is currently constructed. But I need
to know the previous on disk location of the buffer_head. I have to
decrement the counter for the previous segment.

> (7) sufile mod cache
> Consider gathering the cache into nilfs_sufile_info struct and
> stopping to pass it via argument of bmap/sufile/dat interface
> functions.  It's hacky, and decreases readability of programs, and
> is bloating changes of this patchset over multiple function
> blocks.

If I use a global structure, I have to protect it with a lock. Since
almost any operation has to modify the counters in the SUFILE, this
would serialize the whole file system.

> The cache should be well designed. It's important to balance the
> performance and locality/transparency of the feature.  For
> instance, it can be implemented with radix-tree of objects in
> which each object has a vector of 2^

Re: [PATCH 0/9] nilfs2: implementation of cost-benefit GC policy

2015-03-12 Thread Ryusuke Konishi
Hi Andreas,

On Tue, 10 Mar 2015 21:37:50 +0100, Andreas Rohner wrote:
> Hi Ryusuke,
> 
> Thanks for your thorough review.
> 
> On 2015-03-10 06:21, Ryusuke Konishi wrote:
>> Hi Andreas,
>> 
>> I looked through whole kernel patches and a part of util patches.
>> Overall comments are as follows:
>> 
>> [Algorithm]
>> As for algorithm, it looks about OK except for the starvation
>> countermeasure.  The stavation countermeasure looks adhoc/hacky, but
>> it's good that it doesn't change kernel/userland interface; we may be
>> able to replace it with better ways in a future or in a revised
>> version of this patchset.
>> 
>> (1) Drawback of the starvation countermeasure
>> The patch 9/9 looks to make the execution time of chcp operation
>> worse since it will scan through sufile to modify live block
>> counters.  How much does it prolong the execution time ?
> 
> I'll do some tests, but I haven't noticed any significant performance
> drop. The GC basically does the same thing, every time it selects
> segments to reclaim.

GC is performed in background by an independent process.  What I'm
care about it that NILFS_IOCTL_CHANGE_CPMODE ioctl is called from
command line interface or application.  They differ in this meaning.

Was a worse case senario considered in the test ?

For example:
1. Fill a TB class drive with data file(s), and make a snapshot on it.
2. Run one pass GC to update snapshot block counts.
3. And do "chcp cp"

If we don't observe noticeable delay on this class of drive, then I
think we can put the problem off.

>> In a use case of nilfs, many snapshots are created and they are
>> automatically changed back to plain checkpoints because old
>> snapshots are thinned out over time.  The patch 9/9 may impact on
>> such usage.
>>
>> (2) Compatibility
>> What will happen in the following case:
>> 1. Create a file system, use it with the new module, and
>>create snapshots.
>> 2. Mount it with an old module, and release snapshot with "chcp cp"
>> 3. Mount it with the new module, and cleanerd runs gc with
>>cost benefit or greedy policy.
> 
> Some segments could be subject to starvation. But it would probably only
> affect a small number of segments and it could be fixed by "chcp ss
> ; chcp cp ".

Ok, let's treat this as a restriction for now.
If you come up with any good idea, please propose.

>> (3) Durability against unexpected power failures (just a note)
>> The current patchset looks not to cause starvation issue even when
>> unexpected power failure occurs during or after executing "chcp
>> cp" because nilfs_ioctl_change_cpmode() do changes in a
>> transactional way with nilfs_transaction_begin/commit.
>> We should always think this kind of situtation to keep consistency.
>> 
>> [Coding Style]
>> (4) This patchset has several coding style issues. Please fix them and
>> re-check with the latest checkpatch script (script/checkpatch.pl).
> 
> I'll fix that. Sorry.
> 
>> patch 2:
>> WARNING: Prefer kmalloc_array over kmalloc with multiply
>> #85: FILE: fs/nilfs2/sufile.c:1192:
>> +mc->mc_mods = kmalloc(capacity * sizeof(struct nilfs_sufile_mod),
>> 
>> patch 5,6:
>> WARNING: 'aquired' may be misspelled - perhaps 'acquired'?
>> #60: 
>> the same semaphore has to be aquired. So if the DAT-Entry belongs to
>> 
>> WARNING: 'aquired' may be misspelled - perhaps 'acquired'?
>> #46: 
>> be aquired, which blocks the entire SUFILE and effectively turns
>> 
>> WARNING: 'aquired' may be misspelled - perhaps 'acquired'?
>> #53: 
>> afore mentioned lock only needs to be aquired, if the cache is full
>> 
>> (5) sub_sizeof macro:
>> The same definition exists as offsetofend() in vfio.h,
>> and a patch to move it to stddef.h is now proposed.
>> 
>> Please use the same name, and redefine it only if it's not
>> defined:
>> 
>> #ifndef offsetofend
>> #define offsetofend(TYPE, MEMBER) \
>> (offsetof(TYPE, MEMBER) + sizeof(((TYPE *)0)->MEMBER))
>> #endif
> 
> Ok I'll change that.
> 
>> [Implementation]
>> (6) b_blocknr
>> Please do not use bh->b_blocknr to store disk block number.  This
>> field is used to keep virtual block number except for DAT files.
>> It is only replaced to an actual block number during calling
>> submit_bh().  Keep this policy.
> 
> As far as I can tell, this is only true for blocks of GC inodes and node
> blocks. All other buffer_heads are always mapped to on disk blocks by
> nilfs_get_block(). I only added the mapping in nilfs_segbuf_submit_bh()
> to correctly set the value in b_blocknr to the new location.
> 

nilfs_get_block() is only used for regular files, directories, and so
on.  Blocks on metadata files are mapped through
nilfs_mdt_submit_block().  Anyway, yes, they stores actual disk block
number in b_blocknr in the current implementation.  But, it is just a
cutting corner of the current implementation, which comes from the
reason that we have to set actual disk bloc

[PATCH 3/7] nilfs2: use bgl_lock_ptr()

2015-03-12 Thread Ryusuke Konishi
Simplify nilfs_mdt_bgl_lock() by utilizing bgl_lock_ptr() helper in
.

Signed-off-by: Ryusuke Konishi 
---
 fs/nilfs2/mdt.h | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/fs/nilfs2/mdt.h b/fs/nilfs2/mdt.h
index ab172e8..a294ea3 100644
--- a/fs/nilfs2/mdt.h
+++ b/fs/nilfs2/mdt.h
@@ -111,7 +111,10 @@ static inline __u64 nilfs_mdt_cno(struct inode *inode)
return ((struct the_nilfs *)inode->i_sb->s_fs_info)->ns_cno;
 }
 
-#define nilfs_mdt_bgl_lock(inode, bg) \
-   (&NILFS_MDT(inode)->mi_bgl->locks[(bg) & (NR_BG_LOCKS-1)].lock)
+static inline spinlock_t *
+nilfs_mdt_bgl_lock(struct inode *inode, unsigned int block_group)
+{
+   return bgl_lock_ptr(NILFS_MDT(inode)->mi_bgl, block_group);
+}
 
 #endif /* _NILFS_MDT_H */
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 6/7] nilfs2: add helper to find existent block on metadata file

2015-03-12 Thread Ryusuke Konishi
Add a new metadata file function, nilfs_mdt_find_block(), which finds
an existent block on a metadata file in a given range of blocks.  This
function skips continuous hole blocks efficiently by using
nilfs_bmap_seek_key().

Signed-off-by: Ryusuke Konishi 
---
 fs/nilfs2/mdt.c | 54 ++
 fs/nilfs2/mdt.h |  3 +++
 2 files changed, 57 insertions(+)

diff --git a/fs/nilfs2/mdt.c b/fs/nilfs2/mdt.c
index 892cf5f..dee34d9 100644
--- a/fs/nilfs2/mdt.c
+++ b/fs/nilfs2/mdt.c
@@ -261,6 +261,60 @@ int nilfs_mdt_get_block(struct inode *inode, unsigned long 
blkoff, int create,
 }
 
 /**
+ * nilfs_mdt_find_block - find and get a buffer on meta data file.
+ * @inode: inode of the meta data file
+ * @start: start block offset (inclusive)
+ * @end: end block offset (inclusive)
+ * @blkoff: block offset
+ * @out_bh: place to store a pointer to buffer_head struct
+ *
+ * nilfs_mdt_find_block() looks up an existing block in range of
+ * [@start, @end] and stores pointer to a buffer head of the block to
+ * @out_bh, and block offset to @blkoff, respectively.  @out_bh and
+ * @blkoff are substituted only when zero is returned.
+ *
+ * Return Value: On success, it returns 0. On error, the following negative
+ * error code is returned.
+ *
+ * %-ENOMEM - Insufficient memory available.
+ *
+ * %-EIO - I/O error
+ *
+ * %-ENOENT - no block was found in the range
+ */
+int nilfs_mdt_find_block(struct inode *inode, unsigned long start,
+unsigned long end, unsigned long *blkoff,
+struct buffer_head **out_bh)
+{
+   __u64 next;
+   int ret;
+
+   if (unlikely(start > end))
+   return -ENOENT;
+
+   ret = nilfs_mdt_read_block(inode, start, true, out_bh);
+   if (!ret) {
+   *blkoff = start;
+   goto out;
+   }
+   if (unlikely(ret != -ENOENT || start == ULONG_MAX))
+   goto out;
+
+   ret = nilfs_bmap_seek_key(NILFS_I(inode)->i_bmap, start + 1, &next);
+   if (!ret) {
+   if (next <= end) {
+   ret = nilfs_mdt_read_block(inode, next, true, out_bh);
+   if (!ret)
+   *blkoff = next;
+   } else {
+   ret = -ENOENT;
+   }
+   }
+out:
+   return ret;
+}
+
+/**
  * nilfs_mdt_delete_block - make a hole on the meta data file.
  * @inode: inode of the meta data file
  * @block: block offset
diff --git a/fs/nilfs2/mdt.h b/fs/nilfs2/mdt.h
index a294ea3..fe529a8 100644
--- a/fs/nilfs2/mdt.h
+++ b/fs/nilfs2/mdt.h
@@ -78,6 +78,9 @@ int nilfs_mdt_get_block(struct inode *, unsigned long, int,
void (*init_block)(struct inode *,
   struct buffer_head *, void *),
struct buffer_head **);
+int nilfs_mdt_find_block(struct inode *inode, unsigned long start,
+unsigned long end, unsigned long *blkoff,
+struct buffer_head **out_bh);
 int nilfs_mdt_delete_block(struct inode *, unsigned long);
 int nilfs_mdt_forget_block(struct inode *, unsigned long);
 int nilfs_mdt_mark_block_dirty(struct inode *, unsigned long);
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 5/7] nilfs2: add bmap function to seek a valid key

2015-03-12 Thread Ryusuke Konishi
Add a new bmap function, nilfs_bmap_seek_key(), which seeks a valid
entry and returns its key starting from a given key.  This function
can be used to skip hole blocks efficiently.

Signed-off-by: Ryusuke Konishi 
---
 fs/nilfs2/bmap.c   | 31 +
 fs/nilfs2/bmap.h   |  5 -
 fs/nilfs2/btree.c  | 66 ++
 fs/nilfs2/direct.c | 17 ++
 4 files changed, 118 insertions(+), 1 deletion(-)

diff --git a/fs/nilfs2/bmap.c b/fs/nilfs2/bmap.c
index c82f436..27f75bc 100644
--- a/fs/nilfs2/bmap.c
+++ b/fs/nilfs2/bmap.c
@@ -189,6 +189,37 @@ static int nilfs_bmap_do_delete(struct nilfs_bmap *bmap, 
__u64 key)
return bmap->b_ops->bop_delete(bmap, key);
 }
 
+/**
+ * nilfs_bmap_seek_key - seek a valid entry and return its key
+ * @bmap: bmap struct
+ * @start: start key number
+ * @keyp: place to store valid key
+ *
+ * Description: nilfs_bmap_seek_key() seeks a valid key on @bmap
+ * starting from @start, and stores it to @keyp if found.
+ *
+ * Return Value: On success, 0 is returned. On error, one of the following
+ * negative error codes is returned.
+ *
+ * %-EIO - I/O error.
+ *
+ * %-ENOMEM - Insufficient amount of memory available.
+ *
+ * %-ENOENT - No valid entry was found
+ */
+int nilfs_bmap_seek_key(struct nilfs_bmap *bmap, __u64 start, __u64 *keyp)
+{
+   int ret;
+
+   down_read(&bmap->b_sem);
+   ret = bmap->b_ops->bop_seek_key(bmap, start, keyp);
+   up_read(&bmap->b_sem);
+
+   if (ret < 0)
+   ret = nilfs_bmap_convert_error(bmap, __func__, ret);
+   return ret;
+}
+
 int nilfs_bmap_last_key(struct nilfs_bmap *bmap, __u64 *keyp)
 {
int ret;
diff --git a/fs/nilfs2/bmap.h b/fs/nilfs2/bmap.h
index 9230d33..bfa817c 100644
--- a/fs/nilfs2/bmap.h
+++ b/fs/nilfs2/bmap.h
@@ -76,8 +76,10 @@ struct nilfs_bmap_operations {
  union nilfs_binfo *);
int (*bop_mark)(struct nilfs_bmap *, __u64, int);
 
-   /* The following functions are internal use only. */
+   int (*bop_seek_key)(const struct nilfs_bmap *, __u64, __u64 *);
int (*bop_last_key)(const struct nilfs_bmap *, __u64 *);
+
+   /* The following functions are internal use only. */
int (*bop_check_insert)(const struct nilfs_bmap *, __u64);
int (*bop_check_delete)(struct nilfs_bmap *, __u64);
int (*bop_gather_data)(struct nilfs_bmap *, __u64 *, __u64 *, int);
@@ -155,6 +157,7 @@ void nilfs_bmap_write(struct nilfs_bmap *, struct 
nilfs_inode *);
 int nilfs_bmap_lookup_contig(struct nilfs_bmap *, __u64, __u64 *, unsigned);
 int nilfs_bmap_insert(struct nilfs_bmap *bmap, __u64 key, unsigned long rec);
 int nilfs_bmap_delete(struct nilfs_bmap *bmap, __u64 key);
+int nilfs_bmap_seek_key(struct nilfs_bmap *bmap, __u64 start, __u64 *keyp);
 int nilfs_bmap_last_key(struct nilfs_bmap *bmap, __u64 *keyp);
 int nilfs_bmap_truncate(struct nilfs_bmap *bmap, __u64 key);
 void nilfs_bmap_clear(struct nilfs_bmap *);
diff --git a/fs/nilfs2/btree.c b/fs/nilfs2/btree.c
index ecdbae1..841d177 100644
--- a/fs/nilfs2/btree.c
+++ b/fs/nilfs2/btree.c
@@ -633,6 +633,44 @@ static int nilfs_btree_do_lookup_last(const struct 
nilfs_bmap *btree,
return 0;
 }
 
+/**
+ * nilfs_btree_get_next_key - get next valid key from btree path array
+ * @btree: bmap struct of btree
+ * @path: array of nilfs_btree_path struct
+ * @minlevel: start level
+ * @nextkey: place to store the next valid key
+ *
+ * Return Value: If a next key was found, 0 is returned. Otherwise,
+ * -ENOENT is returned.
+ */
+static int nilfs_btree_get_next_key(const struct nilfs_bmap *btree,
+   const struct nilfs_btree_path *path,
+   int minlevel, __u64 *nextkey)
+{
+   struct nilfs_btree_node *node;
+   int maxlevel = nilfs_btree_height(btree) - 1;
+   int index, next_adj, level;
+
+   /* Next index is already set to bp_index for leaf nodes. */
+   next_adj = 0;
+   for (level = minlevel; level <= maxlevel; level++) {
+   if (level == maxlevel)
+   node = nilfs_btree_get_root(btree);
+   else
+   node = nilfs_btree_get_nonroot_node(path, level);
+
+   index = path[level].bp_index + next_adj;
+   if (index < nilfs_btree_node_get_nchildren(node)) {
+   /* Next key is in this node */
+   *nextkey = nilfs_btree_node_get_key(node, index);
+   return 0;
+   }
+   /* For non-leaf nodes, next index is stored at bp_index + 1. */
+   next_adj = 1;
+   }
+   return -ENOENT;
+}
+
 static int nilfs_btree_lookup(const struct nilfs_bmap *btree,
  __u64 key, int level, __u64 *ptrp)
 {
@@ -1563,6 +1601,30 @@ out:
return ret;
 }
 
+static int nilfs_btree_seek_key(const struct nilfs_bmap *btree, __u64 start,
+   

[PATCH 2/7] nilfs2: use set_mask_bits() for operations on buffer state bitmap

2015-03-12 Thread Ryusuke Konishi
nilfs_forget_buffer(), nilfs_clear_dirty_page(), and
nilfs_segctor_complete_write() are using a bunch of atomic bit
operations against buffer state bitmap.

This reduces the number of them by utilizing set_mask_bits() macro.

Signed-off-by: Ryusuke Konishi 
---
 fs/nilfs2/page.c| 24 ++--
 fs/nilfs2/segment.c | 14 --
 2 files changed, 18 insertions(+), 20 deletions(-)

diff --git a/fs/nilfs2/page.c b/fs/nilfs2/page.c
index 700ecbc..45d650a 100644
--- a/fs/nilfs2/page.c
+++ b/fs/nilfs2/page.c
@@ -89,18 +89,16 @@ struct buffer_head *nilfs_grab_buffer(struct inode *inode,
 void nilfs_forget_buffer(struct buffer_head *bh)
 {
struct page *page = bh->b_page;
+   const unsigned long clear_bits =
+   (1 << BH_Uptodate | 1 << BH_Dirty | 1 << BH_Mapped |
+1 << BH_Async_Write | 1 << BH_NILFS_Volatile |
+1 << BH_NILFS_Checked | 1 << BH_NILFS_Redirected);
 
lock_buffer(bh);
-   clear_buffer_nilfs_volatile(bh);
-   clear_buffer_nilfs_checked(bh);
-   clear_buffer_nilfs_redirected(bh);
-   clear_buffer_async_write(bh);
-   clear_buffer_dirty(bh);
+   set_mask_bits(&bh->b_state, clear_bits, 0);
if (nilfs_page_buffers_clean(page))
__nilfs_clear_page_dirty(page);
 
-   clear_buffer_uptodate(bh);
-   clear_buffer_mapped(bh);
bh->b_blocknr = -1;
ClearPageUptodate(page);
ClearPageMappedToDisk(page);
@@ -421,6 +419,10 @@ void nilfs_clear_dirty_page(struct page *page, bool silent)
 
if (page_has_buffers(page)) {
struct buffer_head *bh, *head;
+   const unsigned long clear_bits =
+   (1 << BH_Uptodate | 1 << BH_Dirty | 1 << BH_Mapped |
+1 << BH_Async_Write | 1 << BH_NILFS_Volatile |
+1 << BH_NILFS_Checked | 1 << BH_NILFS_Redirected);
 
bh = head = page_buffers(page);
do {
@@ -430,13 +432,7 @@ void nilfs_clear_dirty_page(struct page *page, bool silent)
"discard block %llu, size %zu",
(u64)bh->b_blocknr, bh->b_size);
}
-   clear_buffer_async_write(bh);
-   clear_buffer_dirty(bh);
-   clear_buffer_nilfs_volatile(bh);
-   clear_buffer_nilfs_checked(bh);
-   clear_buffer_nilfs_redirected(bh);
-   clear_buffer_uptodate(bh);
-   clear_buffer_mapped(bh);
+   set_mask_bits(&bh->b_state, clear_bits, 0);
unlock_buffer(bh);
} while (bh = bh->b_this_page, bh != head);
}
diff --git a/fs/nilfs2/segment.c b/fs/nilfs2/segment.c
index c9a4e60..c6abbad9 100644
--- a/fs/nilfs2/segment.c
+++ b/fs/nilfs2/segment.c
@@ -24,6 +24,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -1785,12 +1786,13 @@ static void nilfs_segctor_complete_write(struct 
nilfs_sc_info *sci)
 */
list_for_each_entry(bh, &segbuf->sb_payload_buffers,
b_assoc_buffers) {
-   set_buffer_uptodate(bh);
-   clear_buffer_dirty(bh);
-   clear_buffer_async_write(bh);
-   clear_buffer_delay(bh);
-   clear_buffer_nilfs_volatile(bh);
-   clear_buffer_nilfs_redirected(bh);
+   const unsigned long set_bits = (1 << BH_Uptodate);
+   const unsigned long clear_bits =
+   (1 << BH_Dirty | 1 << BH_Async_Write |
+1 << BH_Delay | 1 << BH_NILFS_Volatile |
+1 << BH_NILFS_Redirected);
+
+   set_mask_bits(&bh->b_state, clear_bits, set_bits);
if (bh == segbuf->sb_super_root) {
if (bh->b_page != bd_page) {
end_page_writeback(bd_page);
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/7] nilfs2: unify type of key arguments in bmap interface

2015-03-12 Thread Ryusuke Konishi
The type of key arguments in block mapping interface varies depending
on function.  For instance, nilfs_bmap_lookup_at_level() takes "__u64"
for its key argument whereas nilfs_bmap_lookup() takes "unsigned
long".

This fits them to "__u64" to eliminate the variation.

Signed-off-by: Ryusuke Konishi 
---
 fs/nilfs2/alloc.c |  5 +++--
 fs/nilfs2/bmap.c  | 17 ++---
 fs/nilfs2/bmap.h  |  8 
 fs/nilfs2/inode.c |  6 +++---
 4 files changed, 16 insertions(+), 20 deletions(-)

diff --git a/fs/nilfs2/alloc.c b/fs/nilfs2/alloc.c
index 741fd02..8df0f3b 100644
--- a/fs/nilfs2/alloc.c
+++ b/fs/nilfs2/alloc.c
@@ -405,13 +405,14 @@ nilfs_palloc_rest_groups_in_desc_block(const struct inode 
*inode,
 static int nilfs_palloc_count_desc_blocks(struct inode *inode,
unsigned long *desc_blocks)
 {
-   unsigned long blknum;
+   __u64 blknum;
int ret;
 
ret = nilfs_bmap_last_key(NILFS_I(inode)->i_bmap, &blknum);
if (likely(!ret))
*desc_blocks = DIV_ROUND_UP(
-   blknum, NILFS_MDT(inode)->mi_blocks_per_desc_block);
+   (unsigned long)blknum,
+   NILFS_MDT(inode)->mi_blocks_per_desc_block);
return ret;
 }
 
diff --git a/fs/nilfs2/bmap.c b/fs/nilfs2/bmap.c
index aadbd0b..c82f436 100644
--- a/fs/nilfs2/bmap.c
+++ b/fs/nilfs2/bmap.c
@@ -152,9 +152,7 @@ static int nilfs_bmap_do_insert(struct nilfs_bmap *bmap, 
__u64 key, __u64 ptr)
  *
  * %-EEXIST - A record associated with @key already exist.
  */
-int nilfs_bmap_insert(struct nilfs_bmap *bmap,
- unsigned long key,
- unsigned long rec)
+int nilfs_bmap_insert(struct nilfs_bmap *bmap, __u64 key, unsigned long rec)
 {
int ret;
 
@@ -191,19 +189,16 @@ static int nilfs_bmap_do_delete(struct nilfs_bmap *bmap, 
__u64 key)
return bmap->b_ops->bop_delete(bmap, key);
 }
 
-int nilfs_bmap_last_key(struct nilfs_bmap *bmap, unsigned long *key)
+int nilfs_bmap_last_key(struct nilfs_bmap *bmap, __u64 *keyp)
 {
-   __u64 lastkey;
int ret;
 
down_read(&bmap->b_sem);
-   ret = bmap->b_ops->bop_last_key(bmap, &lastkey);
+   ret = bmap->b_ops->bop_last_key(bmap, keyp);
up_read(&bmap->b_sem);
 
if (ret < 0)
ret = nilfs_bmap_convert_error(bmap, __func__, ret);
-   else
-   *key = lastkey;
return ret;
 }
 
@@ -224,7 +219,7 @@ int nilfs_bmap_last_key(struct nilfs_bmap *bmap, unsigned 
long *key)
  *
  * %-ENOENT - A record associated with @key does not exist.
  */
-int nilfs_bmap_delete(struct nilfs_bmap *bmap, unsigned long key)
+int nilfs_bmap_delete(struct nilfs_bmap *bmap, __u64 key)
 {
int ret;
 
@@ -235,7 +230,7 @@ int nilfs_bmap_delete(struct nilfs_bmap *bmap, unsigned 
long key)
return nilfs_bmap_convert_error(bmap, __func__, ret);
 }
 
-static int nilfs_bmap_do_truncate(struct nilfs_bmap *bmap, unsigned long key)
+static int nilfs_bmap_do_truncate(struct nilfs_bmap *bmap, __u64 key)
 {
__u64 lastkey;
int ret;
@@ -276,7 +271,7 @@ static int nilfs_bmap_do_truncate(struct nilfs_bmap *bmap, 
unsigned long key)
  *
  * %-ENOMEM - Insufficient amount of memory available.
  */
-int nilfs_bmap_truncate(struct nilfs_bmap *bmap, unsigned long key)
+int nilfs_bmap_truncate(struct nilfs_bmap *bmap, __u64 key)
 {
int ret;
 
diff --git a/fs/nilfs2/bmap.h b/fs/nilfs2/bmap.h
index b89e680..9230d33 100644
--- a/fs/nilfs2/bmap.h
+++ b/fs/nilfs2/bmap.h
@@ -153,10 +153,10 @@ int nilfs_bmap_test_and_clear_dirty(struct nilfs_bmap *);
 int nilfs_bmap_read(struct nilfs_bmap *, struct nilfs_inode *);
 void nilfs_bmap_write(struct nilfs_bmap *, struct nilfs_inode *);
 int nilfs_bmap_lookup_contig(struct nilfs_bmap *, __u64, __u64 *, unsigned);
-int nilfs_bmap_insert(struct nilfs_bmap *, unsigned long, unsigned long);
-int nilfs_bmap_delete(struct nilfs_bmap *, unsigned long);
-int nilfs_bmap_last_key(struct nilfs_bmap *, unsigned long *);
-int nilfs_bmap_truncate(struct nilfs_bmap *, unsigned long);
+int nilfs_bmap_insert(struct nilfs_bmap *bmap, __u64 key, unsigned long rec);
+int nilfs_bmap_delete(struct nilfs_bmap *bmap, __u64 key);
+int nilfs_bmap_last_key(struct nilfs_bmap *bmap, __u64 *keyp);
+int nilfs_bmap_truncate(struct nilfs_bmap *bmap, __u64 key);
 void nilfs_bmap_clear(struct nilfs_bmap *);
 int nilfs_bmap_propagate(struct nilfs_bmap *, struct buffer_head *);
 void nilfs_bmap_lookup_dirty_buffers(struct nilfs_bmap *, struct list_head *);
diff --git a/fs/nilfs2/inode.c b/fs/nilfs2/inode.c
index 8b59695..cf9e489 100644
--- a/fs/nilfs2/inode.c
+++ b/fs/nilfs2/inode.c
@@ -106,7 +106,7 @@ int nilfs_get_block(struct inode *inode, sector_t blkoff,
err = nilfs_transaction_begin(inode->i_sb, &ti, 1);
if (unlikely(err))
goto out;
-   err = nilfs_bmap_insert(ii->i_bmap, (unsigned long)blkoff,
+

[PATCH 0/7] nilfs2 updates

2015-03-12 Thread Ryusuke Konishi
Hi Andrew,

Please queue the following changes for the next merge window:

Ryusuke Konishi (7):
  nilfs2: do not use async write flag for segment summary buffers
  nilfs2: use set_mask_bits() for operations on buffer state bitmap
  nilfs2: use bgl_lock_ptr()
  nilfs2: unify type of key arguments in bmap interface
  nilfs2: add bmap function to seek a valid key
  nilfs2: add helper to find existent block on metadata file
  nilfs2: improve execution time of NILFS_IOCTL_GET_CPINFO ioctl

* Brief summary

> nilfs2: do not use async write flag for segment summary buffers
> nilfs2: use set_mask_bits() for operations on buffer state bitmap

These reduce the number of atomic bit operations against b_state
bitmap by utilizing set_mask_bits() common helper, or by removing
unnecessary bit operations.

> nilfs2: use bgl_lock_ptr()

This is a cleanup patch, which makes use of bgl_lock_ptr() common
helper for simplicity.

> nilfs2: unify type of key arguments in bmap interface
> nilfs2: add bmap function to seek a valid key
> nilfs2: add helper to find existent block on metadata file
> nilfs2: improve execution time of NILFS_IOCTL_GET_CPINFO ioctl

These improve execution time of the ioctl for checkpoint listing,
which gets worse as the file system ages.

Example:
  [The current implementation]
  $ time lscp
   CNODATE TIME  MODE  FLG  BLKCNT   ICNT
   5769303  2015-02-22 19:31:33   cp-  108  1
   5769304  2015-02-22 19:38:54   cp-  108  1

  real0m0.182s
  user0m0.003s
  sys 0m0.180s

  [With the patchset]
  $ time lscp
   CNODATE TIME  MODE  FLG  BLKCNT   ICNT
   5769303  2015-02-22 19:31:33   cp-  108  1
   5769304  2015-02-22 19:38:54   cp-  108  1

  real0m0.003s
  user0m0.001s
  sys 0m0.002s

Thanks,
Ryusuke Konishi
--

 fs/nilfs2/alloc.c   |  5 ++--
 fs/nilfs2/bmap.c| 48 +-
 fs/nilfs2/bmap.h| 13 +++
 fs/nilfs2/btree.c   | 66 +
 fs/nilfs2/cpfile.c  | 58 +-
 fs/nilfs2/direct.c  | 17 ++
 fs/nilfs2/inode.c   |  6 ++---
 fs/nilfs2/mdt.c | 54 +++
 fs/nilfs2/mdt.h | 10 ++--
 fs/nilfs2/page.c| 24 ---
 fs/nilfs2/segment.c | 17 +++---
 11 files changed, 266 insertions(+), 52 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 7/7] nilfs2: improve execution time of NILFS_IOCTL_GET_CPINFO ioctl

2015-03-12 Thread Ryusuke Konishi
The older a filesystem gets, the slower lscp command becomes.  This is
because nilfs_cpfile_do_get_cpinfo() function meets more hole blocks
as the start offset of valid checkpoint numbers gets bigger.

This reduces the overhead by skipping hole blocks efficiently with
nilfs_mdt_find_block() helper.

A measurement result of this patch is as follows:

Before:
$ time lscp
 CNODATE TIME  MODE  FLG  BLKCNT   ICNT
 5769303  2015-02-22 19:31:33   cp-  108  1
 5769304  2015-02-22 19:38:54   cp-  108  1

real0m0.182s
user0m0.003s
sys 0m0.180s

After:
$ time lscp
 CNODATE TIME  MODE  FLG  BLKCNT   ICNT
 5769303  2015-02-22 19:31:33   cp-  108  1
 5769304  2015-02-22 19:38:54   cp-  108  1

real0m0.003s
user0m0.001s
sys 0m0.002s

Signed-off-by: Ryusuke Konishi 
---
 fs/nilfs2/cpfile.c | 58 --
 1 file changed, 52 insertions(+), 6 deletions(-)

diff --git a/fs/nilfs2/cpfile.c b/fs/nilfs2/cpfile.c
index 0d58075..b6596ca 100644
--- a/fs/nilfs2/cpfile.c
+++ b/fs/nilfs2/cpfile.c
@@ -53,6 +53,13 @@ nilfs_cpfile_get_offset(const struct inode *cpfile, __u64 
cno)
return do_div(tcno, nilfs_cpfile_checkpoints_per_block(cpfile));
 }
 
+static __u64 nilfs_cpfile_first_checkpoint_in_block(const struct inode *cpfile,
+   unsigned long blkoff)
+{
+   return (__u64)nilfs_cpfile_checkpoints_per_block(cpfile) * blkoff
+   + 1 - NILFS_MDT(cpfile)->mi_first_entry_offset;
+}
+
 static unsigned long
 nilfs_cpfile_checkpoints_in_block(const struct inode *cpfile,
  __u64 curr,
@@ -146,6 +153,44 @@ static inline int nilfs_cpfile_get_checkpoint_block(struct 
inode *cpfile,
   create, nilfs_cpfile_block_init, bhp);
 }
 
+/**
+ * nilfs_cpfile_find_checkpoint_block - find and get a buffer on cpfile
+ * @cpfile: inode of cpfile
+ * @start_cno: start checkpoint number (inclusive)
+ * @end_cno: end checkpoint number (inclusive)
+ * @cnop: place to store the next checkpoint number
+ * @bhp: place to store a pointer to buffer_head struct
+ *
+ * Return Value: On success, it returns 0. On error, the following negative
+ * error code is returned.
+ *
+ * %-ENOMEM - Insufficient memory available.
+ *
+ * %-EIO - I/O error
+ *
+ * %-ENOENT - no block exists in the range.
+ */
+static int nilfs_cpfile_find_checkpoint_block(struct inode *cpfile,
+ __u64 start_cno, __u64 end_cno,
+ __u64 *cnop,
+ struct buffer_head **bhp)
+{
+   unsigned long start, end, blkoff;
+   int ret;
+
+   if (unlikely(start_cno > end_cno))
+   return -ENOENT;
+
+   start = nilfs_cpfile_get_blkoff(cpfile, start_cno);
+   end = nilfs_cpfile_get_blkoff(cpfile, end_cno);
+
+   ret = nilfs_mdt_find_block(cpfile, start, end, &blkoff, bhp);
+   if (!ret)
+   *cnop = (blkoff == start) ? start_cno :
+   nilfs_cpfile_first_checkpoint_in_block(cpfile, blkoff);
+   return ret;
+}
+
 static inline int nilfs_cpfile_delete_checkpoint_block(struct inode *cpfile,
   __u64 cno)
 {
@@ -403,14 +448,15 @@ static ssize_t nilfs_cpfile_do_get_cpinfo(struct inode 
*cpfile, __u64 *cnop,
return -ENOENT; /* checkpoint number 0 is invalid */
down_read(&NILFS_MDT(cpfile)->mi_sem);
 
-   for (n = 0; cno < cur_cno && n < nci; cno += ncps) {
-   ncps = nilfs_cpfile_checkpoints_in_block(cpfile, cno, cur_cno);
-   ret = nilfs_cpfile_get_checkpoint_block(cpfile, cno, 0, &bh);
+   for (n = 0; n < nci; cno += ncps) {
+   ret = nilfs_cpfile_find_checkpoint_block(
+   cpfile, cno, cur_cno - 1, &cno, &bh);
if (ret < 0) {
-   if (ret != -ENOENT)
-   goto out;
-   continue; /* skip hole */
+   if (likely(ret == -ENOENT))
+   break;
+   goto out;
}
+   ncps = nilfs_cpfile_checkpoints_in_block(cpfile, cno, cur_cno);
 
kaddr = kmap_atomic(bh->b_page);
cp = nilfs_cpfile_block_get_checkpoint(cpfile, cno, bh, kaddr);
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/7] nilfs2: do not use async write flag for segment summary buffers

2015-03-12 Thread Ryusuke Konishi
The async write flag is introduced to nilfs2 in the commit
7f42ec394156 ("nilfs2: fix issue with race condition of competition
between segments for dirty blocks"), but the flag only makes sense for
data buffers and btree node buffers.  It is not needed for segment
summary buffers.

This gits rid of the latter uses as part of refactoring of atomic bit
operations on buffer state bitmap.

Signed-off-by: Ryusuke Konishi 
Cc: Vyacheslav Dubeyko 
---
 fs/nilfs2/segment.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/fs/nilfs2/segment.c b/fs/nilfs2/segment.c
index 0c3f303..c9a4e60 100644
--- a/fs/nilfs2/segment.c
+++ b/fs/nilfs2/segment.c
@@ -1588,7 +1588,6 @@ static void nilfs_segctor_prepare_write(struct 
nilfs_sc_info *sci)
 
list_for_each_entry(bh, &segbuf->sb_segsum_buffers,
b_assoc_buffers) {
-   set_buffer_async_write(bh);
if (bh->b_page != bd_page) {
if (bd_page) {
lock_page(bd_page);
@@ -1688,7 +1687,6 @@ static void nilfs_abort_logs(struct list_head *logs, int 
err)
list_for_each_entry(segbuf, logs, sb_list) {
list_for_each_entry(bh, &segbuf->sb_segsum_buffers,
b_assoc_buffers) {
-   clear_buffer_async_write(bh);
if (bh->b_page != bd_page) {
if (bd_page)
end_page_writeback(bd_page);
@@ -1768,7 +1766,6 @@ static void nilfs_segctor_complete_write(struct 
nilfs_sc_info *sci)
b_assoc_buffers) {
set_buffer_uptodate(bh);
clear_buffer_dirty(bh);
-   clear_buffer_async_write(bh);
if (bh->b_page != bd_page) {
if (bd_page)
end_page_writeback(bd_page);
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html