Re: [PATCHv2 1/3] fs/buffer.c: allocate buffer cache with user specific flag

2014-08-20 Thread Gioh Kim



2014-08-21 오전 7:02, Jan Kara 쓴 글:

On Wed 20-08-14 11:38:10, Gioh Kim wrote:



@@ -1381,12 +1383,7 @@ EXPORT_SYMBOL(__find_get_block);
  struct buffer_head *
  __getblk(struct block_device *bdev, sector_t block, unsigned size)
  {
-   struct buffer_head *bh = __find_get_block(bdev, block, size);
-
-   might_sleep();
-   if (bh == NULL)
-   bh = __getblk_slow(bdev, block, size);
-   return bh;
+   return __getblk_gfp(bdev, block, size, __GFP_MOVABLE);
  }
  EXPORT_SYMBOL(__getblk);

   Why did you remove the __find_get_block() call? That looks like a bug.

   I'm not sure if you didn't miss this comment


I'm sorry I missed it.
I think calling __find_get_block() in __getblk_gfp() can replace it.
I'm not sure about it.

If anybody disagree with it, I'll change it as the original code.

   OK, I see. Thanks for explanation. I agree we can remove
__find_get_block() from __getblk() but please make this change a separate
patch and also please put the might_sleep() check __getblk_gfp().

Honza



I got it.
I'm going to report v3 patch that reverts __getblk() and adds 
sb_bread_unmovable(),
if Andrew give me a feedback about the other codes.
I can wait ;-)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCHv2 1/3] fs/buffer.c: allocate buffer cache with user specific flag

2014-08-20 Thread Jan Kara
On Wed 20-08-14 11:38:10, Gioh Kim wrote:
> 
> @@ -1381,12 +1383,7 @@ EXPORT_SYMBOL(__find_get_block);
>   struct buffer_head *
>   __getblk(struct block_device *bdev, sector_t block, unsigned size)
>   {
> -   struct buffer_head *bh = __find_get_block(bdev, block, size);
> -
> -   might_sleep();
> -   if (bh == NULL)
> -   bh = __getblk_slow(bdev, block, size);
> -   return bh;
> +   return __getblk_gfp(bdev, block, size, __GFP_MOVABLE);
>   }
>   EXPORT_SYMBOL(__getblk);
> >>>   Why did you remove the __find_get_block() call? That looks like a bug.
> >   I'm not sure if you didn't miss this comment
> 
> I'm sorry I missed it.
> I think calling __find_get_block() in __getblk_gfp() can replace it.
> I'm not sure about it.
> 
> If anybody disagree with it, I'll change it as the original code.
  OK, I see. Thanks for explanation. I agree we can remove
__find_get_block() from __getblk() but please make this change a separate
patch and also please put the might_sleep() check __getblk_gfp().

Honza
-- 
Jan Kara 
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCHv2 1/3] fs/buffer.c: allocate buffer cache with user specific flag

2014-08-20 Thread Jan Kara
On Wed 20-08-14 11:38:10, Gioh Kim wrote:
 
 @@ -1381,12 +1383,7 @@ EXPORT_SYMBOL(__find_get_block);
   struct buffer_head *
   __getblk(struct block_device *bdev, sector_t block, unsigned size)
   {
 -   struct buffer_head *bh = __find_get_block(bdev, block, size);
 -
 -   might_sleep();
 -   if (bh == NULL)
 -   bh = __getblk_slow(bdev, block, size);
 -   return bh;
 +   return __getblk_gfp(bdev, block, size, __GFP_MOVABLE);
   }
   EXPORT_SYMBOL(__getblk);
Why did you remove the __find_get_block() call? That looks like a bug.
I'm not sure if you didn't miss this comment
 
 I'm sorry I missed it.
 I think calling __find_get_block() in __getblk_gfp() can replace it.
 I'm not sure about it.
 
 If anybody disagree with it, I'll change it as the original code.
  OK, I see. Thanks for explanation. I agree we can remove
__find_get_block() from __getblk() but please make this change a separate
patch and also please put the might_sleep() check __getblk_gfp().

Honza
-- 
Jan Kara j...@suse.cz
SUSE Labs, CR
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCHv2 1/3] fs/buffer.c: allocate buffer cache with user specific flag

2014-08-20 Thread Gioh Kim



2014-08-21 오전 7:02, Jan Kara 쓴 글:

On Wed 20-08-14 11:38:10, Gioh Kim wrote:



@@ -1381,12 +1383,7 @@ EXPORT_SYMBOL(__find_get_block);
  struct buffer_head *
  __getblk(struct block_device *bdev, sector_t block, unsigned size)
  {
-   struct buffer_head *bh = __find_get_block(bdev, block, size);
-
-   might_sleep();
-   if (bh == NULL)
-   bh = __getblk_slow(bdev, block, size);
-   return bh;
+   return __getblk_gfp(bdev, block, size, __GFP_MOVABLE);
  }
  EXPORT_SYMBOL(__getblk);

   Why did you remove the __find_get_block() call? That looks like a bug.

   I'm not sure if you didn't miss this comment


I'm sorry I missed it.
I think calling __find_get_block() in __getblk_gfp() can replace it.
I'm not sure about it.

If anybody disagree with it, I'll change it as the original code.

   OK, I see. Thanks for explanation. I agree we can remove
__find_get_block() from __getblk() but please make this change a separate
patch and also please put the might_sleep() check __getblk_gfp().

Honza



I got it.
I'm going to report v3 patch that reverts __getblk() and adds 
sb_bread_unmovable(),
if Andrew give me a feedback about the other codes.
I can wait ;-)

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCHv2 1/3] fs/buffer.c: allocate buffer cache with user specific flag

2014-08-19 Thread Gioh Kim



@@ -1381,12 +1383,7 @@ EXPORT_SYMBOL(__find_get_block);
  struct buffer_head *
  __getblk(struct block_device *bdev, sector_t block, unsigned size)
  {
-   struct buffer_head *bh = __find_get_block(bdev, block, size);
-
-   might_sleep();
-   if (bh == NULL)
-   bh = __getblk_slow(bdev, block, size);
-   return bh;
+   return __getblk_gfp(bdev, block, size, __GFP_MOVABLE);
  }
  EXPORT_SYMBOL(__getblk);

   Why did you remove the __find_get_block() call? That looks like a bug.

   I'm not sure if you didn't miss this comment


I'm sorry I missed it.
I think calling __find_get_block() in __getblk_gfp() can replace it.
I'm not sure about it.

If anybody disagree with it, I'll change it as the original code.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCHv2 1/3] fs/buffer.c: allocate buffer cache with user specific flag

2014-08-19 Thread Jan Kara
On Wed 20-08-14 08:37:07, Gioh Kim wrote:
> 
> 
> 2014-08-19 오후 10:03, Jan Kara 쓴 글:
> >   Hello,
> >
> >On Tue 19-08-14 15:52:38, Gioh Kim wrote:
> >>A buffer cache is allocated from movable area
> >>because it is referred for a while and released soon.
> >>But some filesystems are taking buffer cache for a long time
> >>and it can disturb page migration.
> >>
> >>A new API should be introduced to allocate buffer cache
> >>with user specific flag.
> >>For instance if user set flag to zero, buffer cache is allocated from
> >>non-movable area.
> >>
> >>Signed-off-by: Gioh Kim 
> >>---
> >>  fs/buffer.c |   52 
> >> +--
> >>  include/linux/buffer_head.h |   12 +-
> >>  2 files changed, 46 insertions(+), 18 deletions(-)
> >>
> >>diff --git a/fs/buffer.c b/fs/buffer.c
> >>index 8f05111..14f2f21 100644
> >>--- a/fs/buffer.c
> >>+++ b/fs/buffer.c
> >>@@ -993,7 +993,7 @@ init_page_buffers(struct page *page, struct 
> >>block_device *bdev,
> >>   */
> >>  static int
> >>  grow_dev_page(struct block_device *bdev, sector_t block,
> >>-   pgoff_t index, int size, int sizebits)
> >>+ pgoff_t index, int size, int sizebits, gfp_t gfp)
> >>  {
> >> struct inode *inode = bdev->bd_inode;
> >> struct page *page;
> >>@@ -1002,10 +1002,10 @@ grow_dev_page(struct block_device *bdev, sector_t 
> >>block,
> >> int ret = 0;/* Will call free_more_memory() */
> >> gfp_t gfp_mask;
> >>
> >>-   gfp_mask = mapping_gfp_mask(inode->i_mapping) & ~__GFP_FS;
> >>-   gfp_mask |= __GFP_MOVABLE;
> >>+   gfp_mask = (mapping_gfp_mask(inode->i_mapping) & ~__GFP_FS) | gfp;
> >>+
> >   Hum, it seems a bit misleading that the 'gfp' flags are just or-ed to
> >mapping_gfp_mask(inode->i_mapping). Usually, passed gfp mask is just
> >directly used. There are also interfaces like pagecache_get_page() which
> >play more complex tricks with mapping_gfp_mask(). This would be yet another
> >convention which I don't think is desirable. I know Andrew suggested what
> >you wrote so I guess I have to settle this with him. Andrew?
> 
> I don't know mapping_gfp_mask(). I just add gfp at the original code.
> Whould you tell me why it is undesirable?
  Well, it's not that mapping_gfp_mask() would be undesirable. It's that
the interface where you pass in gfp mask but it gets silently combined with
another gfp mask seems a bit error prone to me. So would prefer
grow_dev_page() to just use the gfp mask passed and then do something like:

struct buffer_head *getblk_unmovable(struct block_device *bdev, sector_t block,
unsigned size)
{
   return __getblk_gfp(bdev, block, size,
   mapping_gfp_mask(bdev->bd_inode->i_mapping));
}

And similarly in getblk() and other places. But before you go and do this,
I'd like Andrew to say what he thinks about it because maybe he had a good
reason why he wanted it the way you've implemented it.


> >>@@ -1381,12 +1383,7 @@ EXPORT_SYMBOL(__find_get_block);
> >>  struct buffer_head *
> >>  __getblk(struct block_device *bdev, sector_t block, unsigned size)
> >>  {
> >>-   struct buffer_head *bh = __find_get_block(bdev, block, size);
> >>-
> >>-   might_sleep();
> >>-   if (bh == NULL)
> >>-   bh = __getblk_slow(bdev, block, size);
> >>-   return bh;
> >>+   return __getblk_gfp(bdev, block, size, __GFP_MOVABLE);
> >>  }
> >>  EXPORT_SYMBOL(__getblk);
> >   Why did you remove the __find_get_block() call? That looks like a bug.
  I'm not sure if you didn't miss this comment

> I think the common interface is important.
> 
> If sb_getblk_unmovable() is obvious for the filesystem,
> I will add some codes for getblk_unmovable() which calling __getblk_gfp(),
> and sb_bread_unmovable() calling __bread_gfp().
> If so, sb_bread_gfp is not necessary.
> 
> It might be like followings:
> 
> diff --git a/fs/buffer.c b/fs/buffer.c
> index 14f2f21..35caf77 100644
> --- a/fs/buffer.c
> +++ b/fs/buffer.c
> @@ -1088,7 +1088,7 @@ grow_buffers(struct block_device *bdev, sector_t block, 
> int siz
> return grow_dev_page(bdev, block, index, size, sizebits, gfp);
>  }
> 
> -struct buffer_head *
> +static struct buffer_head *
>  __getblk_gfp(struct block_device *bdev, sector_t block,
>  unsigned size, gfp_t gfp)
>  {
> @@ -1119,7 +1119,13 @@ __getblk_gfp(struct block_device *bdev, sector_t block,
> free_more_memory();
> }
>  }
> -EXPORT_SYMBOL(__getblk_gfp);
> +
> +struct buffer_head *getblk_unmovable(struct block_device *bdev, sector_t 
> block,
> +unsigned size)
> +{
> +   return __getblk_gfp(bdev, block, size, 0);
> +}
> +EXPORT_SYMBOL(getblk_unmovable);
> 
>  /*
>   * The relationship between dirty buffers and dirty pages:
> diff --git a/include/linux/buffer_head.h b/include/linux/buffer_head.h
> index a1d73fd..c5fb4fc 100644
> --- a/include/linux/buffer_head.h
> +++ 

Re: [PATCHv2 1/3] fs/buffer.c: allocate buffer cache with user specific flag

2014-08-19 Thread Gioh Kim



2014-08-19 오후 10:03, Jan Kara 쓴 글:

   Hello,

On Tue 19-08-14 15:52:38, Gioh Kim wrote:

A buffer cache is allocated from movable area
because it is referred for a while and released soon.
But some filesystems are taking buffer cache for a long time
and it can disturb page migration.

A new API should be introduced to allocate buffer cache
with user specific flag.
For instance if user set flag to zero, buffer cache is allocated from
non-movable area.

Signed-off-by: Gioh Kim 
---
  fs/buffer.c |   52 +--
  include/linux/buffer_head.h |   12 +-
  2 files changed, 46 insertions(+), 18 deletions(-)

diff --git a/fs/buffer.c b/fs/buffer.c
index 8f05111..14f2f21 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -993,7 +993,7 @@ init_page_buffers(struct page *page, struct block_device 
*bdev,
   */
  static int
  grow_dev_page(struct block_device *bdev, sector_t block,
-   pgoff_t index, int size, int sizebits)
+ pgoff_t index, int size, int sizebits, gfp_t gfp)
  {
 struct inode *inode = bdev->bd_inode;
 struct page *page;
@@ -1002,10 +1002,10 @@ grow_dev_page(struct block_device *bdev, sector_t block,
 int ret = 0;/* Will call free_more_memory() */
 gfp_t gfp_mask;

-   gfp_mask = mapping_gfp_mask(inode->i_mapping) & ~__GFP_FS;
-   gfp_mask |= __GFP_MOVABLE;
+   gfp_mask = (mapping_gfp_mask(inode->i_mapping) & ~__GFP_FS) | gfp;
+

   Hum, it seems a bit misleading that the 'gfp' flags are just or-ed to
mapping_gfp_mask(inode->i_mapping). Usually, passed gfp mask is just
directly used. There are also interfaces like pagecache_get_page() which
play more complex tricks with mapping_gfp_mask(). This would be yet another
convention which I don't think is desirable. I know Andrew suggested what
you wrote so I guess I have to settle this with him. Andrew?


I don't know mapping_gfp_mask(). I just add gfp at the original code.
Whould you tell me why it is undesirable?




 /*
-* XXX: __getblk_slow() can not really deal with failure and
+* XXX: __getblk_gfp() can not really deal with failure and
  * will endlessly loop on improvised global reclaim.  Prefer
  * looping in the allocator rather than here, at least that
  * code knows what it's doing.
@@ -1058,7 +1058,7 @@ failed:
   * that page was dirty, the buffers are set dirty also.
   */
  static int
-grow_buffers(struct block_device *bdev, sector_t block, int size)
+grow_buffers(struct block_device *bdev, sector_t block, int size, gfp_t gfp)
  {
 pgoff_t index;
 int sizebits;
@@ -1085,11 +1085,12 @@ grow_buffers(struct block_device *bdev, sector_t block, 
int size)
 }

 /* Create a page with the proper size buffers.. */
-   return grow_dev_page(bdev, block, index, size, sizebits);
+   return grow_dev_page(bdev, block, index, size, sizebits, gfp);
  }

-static struct buffer_head *
-__getblk_slow(struct block_device *bdev, sector_t block, int size)
+struct buffer_head *
+__getblk_gfp(struct block_device *bdev, sector_t block,
+unsigned size, gfp_t gfp)
  {
 /* Size must be multiple of hard sectorsize */
 if (unlikely(size & (bdev_logical_block_size(bdev)-1) ||
@@ -,13 +1112,14 @@ __getblk_slow(struct block_device *bdev, sector_t 
block, int size)
 if (bh)
 return bh;

-   ret = grow_buffers(bdev, block, size);
+   ret = grow_buffers(bdev, block, size, gfp);
 if (ret < 0)
 return NULL;
 if (ret == 0)
 free_more_memory();
 }
  }
+EXPORT_SYMBOL(__getblk_gfp);

  /*
   * The relationship between dirty buffers and dirty pages:
@@ -1381,12 +1383,7 @@ EXPORT_SYMBOL(__find_get_block);
  struct buffer_head *
  __getblk(struct block_device *bdev, sector_t block, unsigned size)
  {
-   struct buffer_head *bh = __find_get_block(bdev, block, size);
-
-   might_sleep();
-   if (bh == NULL)
-   bh = __getblk_slow(bdev, block, size);
-   return bh;
+   return __getblk_gfp(bdev, block, size, __GFP_MOVABLE);
  }
  EXPORT_SYMBOL(__getblk);

   Why did you remove the __find_get_block() call? That looks like a bug.


@@ -1410,18 +1407,39 @@ EXPORT_SYMBOL(__breadahead);
   *  @size: size (in bytes) to read
   *
   *  Reads a specified block, and returns buffer head that contains it.
+ *  The page cache is allocated from movable area so that it can be migrated.
   *  It returns NULL if the block was unreadable.
   */
  struct buffer_head *
  __bread(struct block_device *bdev, sector_t block, unsigned size)
  {
-   struct buffer_head *bh = __getblk(bdev, block, size);
+   return __bread_gfp(bdev, block, size, __GFP_MOVABLE);
+}
+EXPORT_SYMBOL(__bread);
+
+/**
+ *  __bread_gfp() - reads a specified block and returns the bh
+ *  @bdev: the 

Re: [PATCHv2 1/3] fs/buffer.c: allocate buffer cache with user specific flag

2014-08-19 Thread Jan Kara
  Hello,

On Tue 19-08-14 15:52:38, Gioh Kim wrote:
> A buffer cache is allocated from movable area
> because it is referred for a while and released soon.
> But some filesystems are taking buffer cache for a long time
> and it can disturb page migration.
> 
> A new API should be introduced to allocate buffer cache
> with user specific flag.
> For instance if user set flag to zero, buffer cache is allocated from
> non-movable area.
> 
> Signed-off-by: Gioh Kim 
> ---
>  fs/buffer.c |   52 
> +--
>  include/linux/buffer_head.h |   12 +-
>  2 files changed, 46 insertions(+), 18 deletions(-)
> 
> diff --git a/fs/buffer.c b/fs/buffer.c
> index 8f05111..14f2f21 100644
> --- a/fs/buffer.c
> +++ b/fs/buffer.c
> @@ -993,7 +993,7 @@ init_page_buffers(struct page *page, struct block_device 
> *bdev,
>   */
>  static int
>  grow_dev_page(struct block_device *bdev, sector_t block,
> -   pgoff_t index, int size, int sizebits)
> + pgoff_t index, int size, int sizebits, gfp_t gfp)
>  {
> struct inode *inode = bdev->bd_inode;
> struct page *page;
> @@ -1002,10 +1002,10 @@ grow_dev_page(struct block_device *bdev, sector_t 
> block,
> int ret = 0;/* Will call free_more_memory() */
> gfp_t gfp_mask;
> 
> -   gfp_mask = mapping_gfp_mask(inode->i_mapping) & ~__GFP_FS;
> -   gfp_mask |= __GFP_MOVABLE;
> +   gfp_mask = (mapping_gfp_mask(inode->i_mapping) & ~__GFP_FS) | gfp;
> +
  Hum, it seems a bit misleading that the 'gfp' flags are just or-ed to
mapping_gfp_mask(inode->i_mapping). Usually, passed gfp mask is just
directly used. There are also interfaces like pagecache_get_page() which
play more complex tricks with mapping_gfp_mask(). This would be yet another
convention which I don't think is desirable. I know Andrew suggested what
you wrote so I guess I have to settle this with him. Andrew?

> /*
> -* XXX: __getblk_slow() can not really deal with failure and
> +* XXX: __getblk_gfp() can not really deal with failure and
>  * will endlessly loop on improvised global reclaim.  Prefer
>  * looping in the allocator rather than here, at least that
>  * code knows what it's doing.
> @@ -1058,7 +1058,7 @@ failed:
>   * that page was dirty, the buffers are set dirty also.
>   */
>  static int
> -grow_buffers(struct block_device *bdev, sector_t block, int size)
> +grow_buffers(struct block_device *bdev, sector_t block, int size, gfp_t gfp)
>  {
> pgoff_t index;
> int sizebits;
> @@ -1085,11 +1085,12 @@ grow_buffers(struct block_device *bdev, sector_t 
> block, int size)
> }
> 
> /* Create a page with the proper size buffers.. */
> -   return grow_dev_page(bdev, block, index, size, sizebits);
> +   return grow_dev_page(bdev, block, index, size, sizebits, gfp);
>  }
> 
> -static struct buffer_head *
> -__getblk_slow(struct block_device *bdev, sector_t block, int size)
> +struct buffer_head *
> +__getblk_gfp(struct block_device *bdev, sector_t block,
> +unsigned size, gfp_t gfp)
>  {
> /* Size must be multiple of hard sectorsize */
> if (unlikely(size & (bdev_logical_block_size(bdev)-1) ||
> @@ -,13 +1112,14 @@ __getblk_slow(struct block_device *bdev, sector_t 
> block, int size)
> if (bh)
> return bh;
> 
> -   ret = grow_buffers(bdev, block, size);
> +   ret = grow_buffers(bdev, block, size, gfp);
> if (ret < 0)
> return NULL;
> if (ret == 0)
> free_more_memory();
> }
>  }
> +EXPORT_SYMBOL(__getblk_gfp);
> 
>  /*
>   * The relationship between dirty buffers and dirty pages:
> @@ -1381,12 +1383,7 @@ EXPORT_SYMBOL(__find_get_block);
>  struct buffer_head *
>  __getblk(struct block_device *bdev, sector_t block, unsigned size)
>  {
> -   struct buffer_head *bh = __find_get_block(bdev, block, size);
> -
> -   might_sleep();
> -   if (bh == NULL)
> -   bh = __getblk_slow(bdev, block, size);
> -   return bh;
> +   return __getblk_gfp(bdev, block, size, __GFP_MOVABLE);
>  }
>  EXPORT_SYMBOL(__getblk);
  Why did you remove the __find_get_block() call? That looks like a bug.

> @@ -1410,18 +1407,39 @@ EXPORT_SYMBOL(__breadahead);
>   *  @size: size (in bytes) to read
>   *
>   *  Reads a specified block, and returns buffer head that contains it.
> + *  The page cache is allocated from movable area so that it can be migrated.
>   *  It returns NULL if the block was unreadable.
>   */
>  struct buffer_head *
>  __bread(struct block_device *bdev, sector_t block, unsigned size)
>  {
> -   struct buffer_head *bh = __getblk(bdev, block, size);
> +   return __bread_gfp(bdev, block, size, __GFP_MOVABLE);
> +}
> +EXPORT_SYMBOL(__bread);
> +
> +/**
> + *  __bread_gfp() - reads a specified block and returns 

Re: [PATCHv2 1/3] fs/buffer.c: allocate buffer cache with user specific flag

2014-08-19 Thread Gioh Kim



2014-08-19 오후 10:03, Jan Kara 쓴 글:

   Hello,

On Tue 19-08-14 15:52:38, Gioh Kim wrote:

A buffer cache is allocated from movable area
because it is referred for a while and released soon.
But some filesystems are taking buffer cache for a long time
and it can disturb page migration.

A new API should be introduced to allocate buffer cache
with user specific flag.
For instance if user set flag to zero, buffer cache is allocated from
non-movable area.

Signed-off-by: Gioh Kim gioh@lge.com
---
  fs/buffer.c |   52 +--
  include/linux/buffer_head.h |   12 +-
  2 files changed, 46 insertions(+), 18 deletions(-)

diff --git a/fs/buffer.c b/fs/buffer.c
index 8f05111..14f2f21 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -993,7 +993,7 @@ init_page_buffers(struct page *page, struct block_device 
*bdev,
   */
  static int
  grow_dev_page(struct block_device *bdev, sector_t block,
-   pgoff_t index, int size, int sizebits)
+ pgoff_t index, int size, int sizebits, gfp_t gfp)
  {
 struct inode *inode = bdev-bd_inode;
 struct page *page;
@@ -1002,10 +1002,10 @@ grow_dev_page(struct block_device *bdev, sector_t block,
 int ret = 0;/* Will call free_more_memory() */
 gfp_t gfp_mask;

-   gfp_mask = mapping_gfp_mask(inode-i_mapping)  ~__GFP_FS;
-   gfp_mask |= __GFP_MOVABLE;
+   gfp_mask = (mapping_gfp_mask(inode-i_mapping)  ~__GFP_FS) | gfp;
+

   Hum, it seems a bit misleading that the 'gfp' flags are just or-ed to
mapping_gfp_mask(inode-i_mapping). Usually, passed gfp mask is just
directly used. There are also interfaces like pagecache_get_page() which
play more complex tricks with mapping_gfp_mask(). This would be yet another
convention which I don't think is desirable. I know Andrew suggested what
you wrote so I guess I have to settle this with him. Andrew?


I don't know mapping_gfp_mask(). I just add gfp at the original code.
Whould you tell me why it is undesirable?




 /*
-* XXX: __getblk_slow() can not really deal with failure and
+* XXX: __getblk_gfp() can not really deal with failure and
  * will endlessly loop on improvised global reclaim.  Prefer
  * looping in the allocator rather than here, at least that
  * code knows what it's doing.
@@ -1058,7 +1058,7 @@ failed:
   * that page was dirty, the buffers are set dirty also.
   */
  static int
-grow_buffers(struct block_device *bdev, sector_t block, int size)
+grow_buffers(struct block_device *bdev, sector_t block, int size, gfp_t gfp)
  {
 pgoff_t index;
 int sizebits;
@@ -1085,11 +1085,12 @@ grow_buffers(struct block_device *bdev, sector_t block, 
int size)
 }

 /* Create a page with the proper size buffers.. */
-   return grow_dev_page(bdev, block, index, size, sizebits);
+   return grow_dev_page(bdev, block, index, size, sizebits, gfp);
  }

-static struct buffer_head *
-__getblk_slow(struct block_device *bdev, sector_t block, int size)
+struct buffer_head *
+__getblk_gfp(struct block_device *bdev, sector_t block,
+unsigned size, gfp_t gfp)
  {
 /* Size must be multiple of hard sectorsize */
 if (unlikely(size  (bdev_logical_block_size(bdev)-1) ||
@@ -,13 +1112,14 @@ __getblk_slow(struct block_device *bdev, sector_t 
block, int size)
 if (bh)
 return bh;

-   ret = grow_buffers(bdev, block, size);
+   ret = grow_buffers(bdev, block, size, gfp);
 if (ret  0)
 return NULL;
 if (ret == 0)
 free_more_memory();
 }
  }
+EXPORT_SYMBOL(__getblk_gfp);

  /*
   * The relationship between dirty buffers and dirty pages:
@@ -1381,12 +1383,7 @@ EXPORT_SYMBOL(__find_get_block);
  struct buffer_head *
  __getblk(struct block_device *bdev, sector_t block, unsigned size)
  {
-   struct buffer_head *bh = __find_get_block(bdev, block, size);
-
-   might_sleep();
-   if (bh == NULL)
-   bh = __getblk_slow(bdev, block, size);
-   return bh;
+   return __getblk_gfp(bdev, block, size, __GFP_MOVABLE);
  }
  EXPORT_SYMBOL(__getblk);

   Why did you remove the __find_get_block() call? That looks like a bug.


@@ -1410,18 +1407,39 @@ EXPORT_SYMBOL(__breadahead);
   *  @size: size (in bytes) to read
   *
   *  Reads a specified block, and returns buffer head that contains it.
+ *  The page cache is allocated from movable area so that it can be migrated.
   *  It returns NULL if the block was unreadable.
   */
  struct buffer_head *
  __bread(struct block_device *bdev, sector_t block, unsigned size)
  {
-   struct buffer_head *bh = __getblk(bdev, block, size);
+   return __bread_gfp(bdev, block, size, __GFP_MOVABLE);
+}
+EXPORT_SYMBOL(__bread);
+
+/**
+ *  __bread_gfp() - reads a specified block and returns the bh
+ *  @bdev: 

Re: [PATCHv2 1/3] fs/buffer.c: allocate buffer cache with user specific flag

2014-08-19 Thread Jan Kara
On Wed 20-08-14 08:37:07, Gioh Kim wrote:
 
 
 2014-08-19 오후 10:03, Jan Kara 쓴 글:
Hello,
 
 On Tue 19-08-14 15:52:38, Gioh Kim wrote:
 A buffer cache is allocated from movable area
 because it is referred for a while and released soon.
 But some filesystems are taking buffer cache for a long time
 and it can disturb page migration.
 
 A new API should be introduced to allocate buffer cache
 with user specific flag.
 For instance if user set flag to zero, buffer cache is allocated from
 non-movable area.
 
 Signed-off-by: Gioh Kim gioh@lge.com
 ---
   fs/buffer.c |   52 
  +--
   include/linux/buffer_head.h |   12 +-
   2 files changed, 46 insertions(+), 18 deletions(-)
 
 diff --git a/fs/buffer.c b/fs/buffer.c
 index 8f05111..14f2f21 100644
 --- a/fs/buffer.c
 +++ b/fs/buffer.c
 @@ -993,7 +993,7 @@ init_page_buffers(struct page *page, struct 
 block_device *bdev,
*/
   static int
   grow_dev_page(struct block_device *bdev, sector_t block,
 -   pgoff_t index, int size, int sizebits)
 + pgoff_t index, int size, int sizebits, gfp_t gfp)
   {
  struct inode *inode = bdev-bd_inode;
  struct page *page;
 @@ -1002,10 +1002,10 @@ grow_dev_page(struct block_device *bdev, sector_t 
 block,
  int ret = 0;/* Will call free_more_memory() */
  gfp_t gfp_mask;
 
 -   gfp_mask = mapping_gfp_mask(inode-i_mapping)  ~__GFP_FS;
 -   gfp_mask |= __GFP_MOVABLE;
 +   gfp_mask = (mapping_gfp_mask(inode-i_mapping)  ~__GFP_FS) | gfp;
 +
Hum, it seems a bit misleading that the 'gfp' flags are just or-ed to
 mapping_gfp_mask(inode-i_mapping). Usually, passed gfp mask is just
 directly used. There are also interfaces like pagecache_get_page() which
 play more complex tricks with mapping_gfp_mask(). This would be yet another
 convention which I don't think is desirable. I know Andrew suggested what
 you wrote so I guess I have to settle this with him. Andrew?
 
 I don't know mapping_gfp_mask(). I just add gfp at the original code.
 Whould you tell me why it is undesirable?
  Well, it's not that mapping_gfp_mask() would be undesirable. It's that
the interface where you pass in gfp mask but it gets silently combined with
another gfp mask seems a bit error prone to me. So would prefer
grow_dev_page() to just use the gfp mask passed and then do something like:

struct buffer_head *getblk_unmovable(struct block_device *bdev, sector_t block,
unsigned size)
{
   return __getblk_gfp(bdev, block, size,
   mapping_gfp_mask(bdev-bd_inode-i_mapping));
}

And similarly in getblk() and other places. But before you go and do this,
I'd like Andrew to say what he thinks about it because maybe he had a good
reason why he wanted it the way you've implemented it.


 @@ -1381,12 +1383,7 @@ EXPORT_SYMBOL(__find_get_block);
   struct buffer_head *
   __getblk(struct block_device *bdev, sector_t block, unsigned size)
   {
 -   struct buffer_head *bh = __find_get_block(bdev, block, size);
 -
 -   might_sleep();
 -   if (bh == NULL)
 -   bh = __getblk_slow(bdev, block, size);
 -   return bh;
 +   return __getblk_gfp(bdev, block, size, __GFP_MOVABLE);
   }
   EXPORT_SYMBOL(__getblk);
Why did you remove the __find_get_block() call? That looks like a bug.
  I'm not sure if you didn't miss this comment

 I think the common interface is important.
 
 If sb_getblk_unmovable() is obvious for the filesystem,
 I will add some codes for getblk_unmovable() which calling __getblk_gfp(),
 and sb_bread_unmovable() calling __bread_gfp().
 If so, sb_bread_gfp is not necessary.
 
 It might be like followings:
 
 diff --git a/fs/buffer.c b/fs/buffer.c
 index 14f2f21..35caf77 100644
 --- a/fs/buffer.c
 +++ b/fs/buffer.c
 @@ -1088,7 +1088,7 @@ grow_buffers(struct block_device *bdev, sector_t block, 
 int siz
 return grow_dev_page(bdev, block, index, size, sizebits, gfp);
  }
 
 -struct buffer_head *
 +static struct buffer_head *
  __getblk_gfp(struct block_device *bdev, sector_t block,
  unsigned size, gfp_t gfp)
  {
 @@ -1119,7 +1119,13 @@ __getblk_gfp(struct block_device *bdev, sector_t block,
 free_more_memory();
 }
  }
 -EXPORT_SYMBOL(__getblk_gfp);
 +
 +struct buffer_head *getblk_unmovable(struct block_device *bdev, sector_t 
 block,
 +unsigned size)
 +{
 +   return __getblk_gfp(bdev, block, size, 0);
 +}
 +EXPORT_SYMBOL(getblk_unmovable);
 
  /*
   * The relationship between dirty buffers and dirty pages:
 diff --git a/include/linux/buffer_head.h b/include/linux/buffer_head.h
 index a1d73fd..c5fb4fc 100644
 --- a/include/linux/buffer_head.h
 +++ b/include/linux/buffer_head.h
 @@ -177,8 +177,8 @@ struct buffer_head *__find_get_block(struct block_device 
 *bdev, s
 unsigned size);
  struct buffer_head *__getblk(struct block_device *bdev, sector_t 

Re: [PATCHv2 1/3] fs/buffer.c: allocate buffer cache with user specific flag

2014-08-19 Thread Gioh Kim



@@ -1381,12 +1383,7 @@ EXPORT_SYMBOL(__find_get_block);
  struct buffer_head *
  __getblk(struct block_device *bdev, sector_t block, unsigned size)
  {
-   struct buffer_head *bh = __find_get_block(bdev, block, size);
-
-   might_sleep();
-   if (bh == NULL)
-   bh = __getblk_slow(bdev, block, size);
-   return bh;
+   return __getblk_gfp(bdev, block, size, __GFP_MOVABLE);
  }
  EXPORT_SYMBOL(__getblk);

   Why did you remove the __find_get_block() call? That looks like a bug.

   I'm not sure if you didn't miss this comment


I'm sorry I missed it.
I think calling __find_get_block() in __getblk_gfp() can replace it.
I'm not sure about it.

If anybody disagree with it, I'll change it as the original code.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCHv2 1/3] fs/buffer.c: allocate buffer cache with user specific flag

2014-08-19 Thread Jan Kara
  Hello,

On Tue 19-08-14 15:52:38, Gioh Kim wrote:
 A buffer cache is allocated from movable area
 because it is referred for a while and released soon.
 But some filesystems are taking buffer cache for a long time
 and it can disturb page migration.
 
 A new API should be introduced to allocate buffer cache
 with user specific flag.
 For instance if user set flag to zero, buffer cache is allocated from
 non-movable area.
 
 Signed-off-by: Gioh Kim gioh@lge.com
 ---
  fs/buffer.c |   52 
 +--
  include/linux/buffer_head.h |   12 +-
  2 files changed, 46 insertions(+), 18 deletions(-)
 
 diff --git a/fs/buffer.c b/fs/buffer.c
 index 8f05111..14f2f21 100644
 --- a/fs/buffer.c
 +++ b/fs/buffer.c
 @@ -993,7 +993,7 @@ init_page_buffers(struct page *page, struct block_device 
 *bdev,
   */
  static int
  grow_dev_page(struct block_device *bdev, sector_t block,
 -   pgoff_t index, int size, int sizebits)
 + pgoff_t index, int size, int sizebits, gfp_t gfp)
  {
 struct inode *inode = bdev-bd_inode;
 struct page *page;
 @@ -1002,10 +1002,10 @@ grow_dev_page(struct block_device *bdev, sector_t 
 block,
 int ret = 0;/* Will call free_more_memory() */
 gfp_t gfp_mask;
 
 -   gfp_mask = mapping_gfp_mask(inode-i_mapping)  ~__GFP_FS;
 -   gfp_mask |= __GFP_MOVABLE;
 +   gfp_mask = (mapping_gfp_mask(inode-i_mapping)  ~__GFP_FS) | gfp;
 +
  Hum, it seems a bit misleading that the 'gfp' flags are just or-ed to
mapping_gfp_mask(inode-i_mapping). Usually, passed gfp mask is just
directly used. There are also interfaces like pagecache_get_page() which
play more complex tricks with mapping_gfp_mask(). This would be yet another
convention which I don't think is desirable. I know Andrew suggested what
you wrote so I guess I have to settle this with him. Andrew?

 /*
 -* XXX: __getblk_slow() can not really deal with failure and
 +* XXX: __getblk_gfp() can not really deal with failure and
  * will endlessly loop on improvised global reclaim.  Prefer
  * looping in the allocator rather than here, at least that
  * code knows what it's doing.
 @@ -1058,7 +1058,7 @@ failed:
   * that page was dirty, the buffers are set dirty also.
   */
  static int
 -grow_buffers(struct block_device *bdev, sector_t block, int size)
 +grow_buffers(struct block_device *bdev, sector_t block, int size, gfp_t gfp)
  {
 pgoff_t index;
 int sizebits;
 @@ -1085,11 +1085,12 @@ grow_buffers(struct block_device *bdev, sector_t 
 block, int size)
 }
 
 /* Create a page with the proper size buffers.. */
 -   return grow_dev_page(bdev, block, index, size, sizebits);
 +   return grow_dev_page(bdev, block, index, size, sizebits, gfp);
  }
 
 -static struct buffer_head *
 -__getblk_slow(struct block_device *bdev, sector_t block, int size)
 +struct buffer_head *
 +__getblk_gfp(struct block_device *bdev, sector_t block,
 +unsigned size, gfp_t gfp)
  {
 /* Size must be multiple of hard sectorsize */
 if (unlikely(size  (bdev_logical_block_size(bdev)-1) ||
 @@ -,13 +1112,14 @@ __getblk_slow(struct block_device *bdev, sector_t 
 block, int size)
 if (bh)
 return bh;
 
 -   ret = grow_buffers(bdev, block, size);
 +   ret = grow_buffers(bdev, block, size, gfp);
 if (ret  0)
 return NULL;
 if (ret == 0)
 free_more_memory();
 }
  }
 +EXPORT_SYMBOL(__getblk_gfp);
 
  /*
   * The relationship between dirty buffers and dirty pages:
 @@ -1381,12 +1383,7 @@ EXPORT_SYMBOL(__find_get_block);
  struct buffer_head *
  __getblk(struct block_device *bdev, sector_t block, unsigned size)
  {
 -   struct buffer_head *bh = __find_get_block(bdev, block, size);
 -
 -   might_sleep();
 -   if (bh == NULL)
 -   bh = __getblk_slow(bdev, block, size);
 -   return bh;
 +   return __getblk_gfp(bdev, block, size, __GFP_MOVABLE);
  }
  EXPORT_SYMBOL(__getblk);
  Why did you remove the __find_get_block() call? That looks like a bug.

 @@ -1410,18 +1407,39 @@ EXPORT_SYMBOL(__breadahead);
   *  @size: size (in bytes) to read
   *
   *  Reads a specified block, and returns buffer head that contains it.
 + *  The page cache is allocated from movable area so that it can be migrated.
   *  It returns NULL if the block was unreadable.
   */
  struct buffer_head *
  __bread(struct block_device *bdev, sector_t block, unsigned size)
  {
 -   struct buffer_head *bh = __getblk(bdev, block, size);
 +   return __bread_gfp(bdev, block, size, __GFP_MOVABLE);
 +}
 +EXPORT_SYMBOL(__bread);
 +
 +/**
 + *  __bread_gfp() - reads a specified block and returns the bh
 + *  @bdev: the block_device to read from
 + *  @block: number of block
 + *  @size: size (in bytes) to