Re: [PATCH 4/5] block: introduce helpers for allocating IO buffers from slab

2018-10-18 Thread Dave Chinner
On Fri, Oct 19, 2018 at 10:53:49AM +0800, Ming Lei wrote:
> On Thu, Oct 18, 2018 at 05:22:19PM +0200, Christoph Hellwig wrote:
> > On Thu, Oct 18, 2018 at 08:11:23AM -0700, Matthew Wilcox wrote:
> > > On Thu, Oct 18, 2018 at 04:42:07PM +0200, Christoph Hellwig wrote:
> > > > This all seems quite complicated.
> > > > 
> > > > I think the interface we'd want is more one that has a little
> > > > cache of a single page in the queue, and a little bitmap which
> > > > sub-page size blocks of it are used.
> > > > 
> > > > Something like (pseudo code minus locking):
> > > > 
> > > > void *blk_alloc_sector_buffer(struct block_device *bdev, gfp_t gfp)
> > > > {
> > > > unsigned block_size = block_size(bdev);
> > > > 
> > > > if (blocksize >= PAGE_SIZE)
> > > > return (void *)__get_free_pages(gfp, 
> > > > get_order(blocksize));
> > > > 
> > > > if (bdev->fragment_cache_page) {
> > > > [ fragment_cache_page using
> > > >   e.g. bitmap and return if found]
> > > > }
> > > > 
> > > > bdev->fragment_cache_page = (void *)__get_free_page(gfp);
> > > > goto find_again;
> > > > }
> > > 
> > > This looks a lot like page_frag_alloc() except I think page_frag_alloc()
> > > may be more efficient.
> > 
> > Oh, nice. Sounds like XFS should just use page_frag_alloc.  I'll give
> > it a spin.
> 
> XFS or other fs can use page_frag_alloc() directly, seems not necessary to
> introduce this change in block layer any more given 512-aligned buffer
> should be fine everywhere.
> 
> The only benefit to make it as block helper is that the offset or size
> can be checked with q->dma_alignment.
> 
> Dave/Jens, do you think which way is better? Put allocation as block
> helper or fs uses page_frag_alloc() directly for allocating 512*N-byte 
> buffer(total size is less than PAGE_SIZE)?

Cristoph has already said he's looking at using page_frag_alloc()
directly in XFS

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com


Re: [PATCH 4/5] block: introduce helpers for allocating IO buffers from slab

2018-10-18 Thread Jens Axboe
On 10/18/18 8:53 PM, Ming Lei wrote:
> On Thu, Oct 18, 2018 at 05:22:19PM +0200, Christoph Hellwig wrote:
>> On Thu, Oct 18, 2018 at 08:11:23AM -0700, Matthew Wilcox wrote:
>>> On Thu, Oct 18, 2018 at 04:42:07PM +0200, Christoph Hellwig wrote:
 This all seems quite complicated.

 I think the interface we'd want is more one that has a little
 cache of a single page in the queue, and a little bitmap which
 sub-page size blocks of it are used.

 Something like (pseudo code minus locking):

 void *blk_alloc_sector_buffer(struct block_device *bdev, gfp_t gfp)
 {
unsigned block_size = block_size(bdev);

if (blocksize >= PAGE_SIZE)
return (void *)__get_free_pages(gfp, get_order(blocksize));

if (bdev->fragment_cache_page) {
[ fragment_cache_page using
  e.g. bitmap and return if found]
}

bdev->fragment_cache_page = (void *)__get_free_page(gfp);
goto find_again;
 }
>>>
>>> This looks a lot like page_frag_alloc() except I think page_frag_alloc()
>>> may be more efficient.
>>
>> Oh, nice. Sounds like XFS should just use page_frag_alloc.  I'll give
>> it a spin.
> 
> XFS or other fs can use page_frag_alloc() directly, seems not necessary to
> introduce this change in block layer any more given 512-aligned buffer
> should be fine everywhere.
> 
> The only benefit to make it as block helper is that the offset or size
> can be checked with q->dma_alignment.
> 
> Dave/Jens, do you think which way is better? Put allocation as block
> helper or fs uses page_frag_alloc() directly for allocating 512*N-byte 
> buffer(total size is less than PAGE_SIZE)?

I'd greatly prefer having the FS use that directly, seems kind of
pointless to provide an abstraction for that at that point.

-- 
Jens Axboe



Re: [PATCH 4/5] block: introduce helpers for allocating IO buffers from slab

2018-10-18 Thread Ming Lei
On Thu, Oct 18, 2018 at 05:22:19PM +0200, Christoph Hellwig wrote:
> On Thu, Oct 18, 2018 at 08:11:23AM -0700, Matthew Wilcox wrote:
> > On Thu, Oct 18, 2018 at 04:42:07PM +0200, Christoph Hellwig wrote:
> > > This all seems quite complicated.
> > > 
> > > I think the interface we'd want is more one that has a little
> > > cache of a single page in the queue, and a little bitmap which
> > > sub-page size blocks of it are used.
> > > 
> > > Something like (pseudo code minus locking):
> > > 
> > > void *blk_alloc_sector_buffer(struct block_device *bdev, gfp_t gfp)
> > > {
> > >   unsigned block_size = block_size(bdev);
> > > 
> > >   if (blocksize >= PAGE_SIZE)
> > >   return (void *)__get_free_pages(gfp, get_order(blocksize));
> > > 
> > >   if (bdev->fragment_cache_page) {
> > >   [ fragment_cache_page using
> > > e.g. bitmap and return if found]
> > >   }
> > > 
> > >   bdev->fragment_cache_page = (void *)__get_free_page(gfp);
> > >   goto find_again;
> > > }
> > 
> > This looks a lot like page_frag_alloc() except I think page_frag_alloc()
> > may be more efficient.
> 
> Oh, nice. Sounds like XFS should just use page_frag_alloc.  I'll give
> it a spin.

XFS or other fs can use page_frag_alloc() directly, seems not necessary to
introduce this change in block layer any more given 512-aligned buffer
should be fine everywhere.

The only benefit to make it as block helper is that the offset or size
can be checked with q->dma_alignment.

Dave/Jens, do you think which way is better? Put allocation as block
helper or fs uses page_frag_alloc() directly for allocating 512*N-byte 
buffer(total size is less than PAGE_SIZE)?

Thanks,
Ming


Re: [PATCH 4/5] block: introduce helpers for allocating IO buffers from slab

2018-10-18 Thread Christoph Hellwig
On Thu, Oct 18, 2018 at 08:11:23AM -0700, Matthew Wilcox wrote:
> On Thu, Oct 18, 2018 at 04:42:07PM +0200, Christoph Hellwig wrote:
> > This all seems quite complicated.
> > 
> > I think the interface we'd want is more one that has a little
> > cache of a single page in the queue, and a little bitmap which
> > sub-page size blocks of it are used.
> > 
> > Something like (pseudo code minus locking):
> > 
> > void *blk_alloc_sector_buffer(struct block_device *bdev, gfp_t gfp)
> > {
> > unsigned block_size = block_size(bdev);
> > 
> > if (blocksize >= PAGE_SIZE)
> > return (void *)__get_free_pages(gfp, get_order(blocksize));
> > 
> > if (bdev->fragment_cache_page) {
> > [ fragment_cache_page using
> >   e.g. bitmap and return if found]
> > }
> > 
> > bdev->fragment_cache_page = (void *)__get_free_page(gfp);
> > goto find_again;
> > }
> 
> This looks a lot like page_frag_alloc() except I think page_frag_alloc()
> may be more efficient.

Oh, nice. Sounds like XFS should just use page_frag_alloc.  I'll give
it a spin.


Re: [PATCH 4/5] block: introduce helpers for allocating IO buffers from slab

2018-10-18 Thread Matthew Wilcox
On Thu, Oct 18, 2018 at 04:42:07PM +0200, Christoph Hellwig wrote:
> This all seems quite complicated.
> 
> I think the interface we'd want is more one that has a little
> cache of a single page in the queue, and a little bitmap which
> sub-page size blocks of it are used.
> 
> Something like (pseudo code minus locking):
> 
> void *blk_alloc_sector_buffer(struct block_device *bdev, gfp_t gfp)
> {
>   unsigned block_size = block_size(bdev);
> 
>   if (blocksize >= PAGE_SIZE)
>   return (void *)__get_free_pages(gfp, get_order(blocksize));
> 
>   if (bdev->fragment_cache_page) {
>   [ fragment_cache_page using
> e.g. bitmap and return if found]
>   }
> 
>   bdev->fragment_cache_page = (void *)__get_free_page(gfp);
>   goto find_again;
> }

This looks a lot like page_frag_alloc() except I think page_frag_alloc()
may be more efficient.


Re: [PATCH 4/5] block: introduce helpers for allocating IO buffers from slab

2018-10-18 Thread Christoph Hellwig
This all seems quite complicated.

I think the interface we'd want is more one that has a little
cache of a single page in the queue, and a little bitmap which
sub-page size blocks of it are used.

Something like (pseudo code minus locking):

void *blk_alloc_sector_buffer(struct block_device *bdev, gfp_t gfp)
{
unsigned block_size = block_size(bdev);

if (blocksize >= PAGE_SIZE)
return (void *)__get_free_pages(gfp, get_order(blocksize));

if (bdev->fragment_cache_page) {
[ fragment_cache_page using
  e.g. bitmap and return if found]
}

bdev->fragment_cache_page = (void *)__get_free_page(gfp);
goto find_again;
}

note that the locking also is important, preferably we'd be able to do
something lockless.


[PATCH 4/5] block: introduce helpers for allocating IO buffers from slab

2018-10-18 Thread Ming Lei
One big issue is that the allocated buffer from slab has to respect
the queue DMA alignment limit.

This patch supports to create one kmem_cache for less-than PAGE_SIZE
allocation, and makes sure that the allocation is aligned with queue
DMA alignment.

For >= PAGE_SIZE allocation, it should be done via buddy directly.

Cc: Vitaly Kuznetsov 
Cc: Dave Chinner 
Cc: Linux FS Devel 
Cc: Darrick J. Wong 
Cc: x...@vger.kernel.org
Cc: Dave Chinner 
Cc: Christoph Hellwig 
Cc: Bart Van Assche 
Cc: Matthew Wilcox 
Signed-off-by: Ming Lei 
---
 block/Makefile  |   3 +-
 block/blk-core.c|   2 +
 block/blk-sec-buf.c | 144 
 include/linux/blk-sec-buf.h |  43 +
 include/linux/blkdev.h  |   5 ++
 5 files changed, 196 insertions(+), 1 deletion(-)
 create mode 100644 block/blk-sec-buf.c
 create mode 100644 include/linux/blk-sec-buf.h

diff --git a/block/Makefile b/block/Makefile
index 27eac600474f..74f3ed6ef954 100644
--- a/block/Makefile
+++ b/block/Makefile
@@ -9,7 +9,8 @@ obj-$(CONFIG_BLOCK) := bio.o elevator.o blk-core.o blk-tag.o 
blk-sysfs.o \
blk-lib.o blk-mq.o blk-mq-tag.o blk-stat.o \
blk-mq-sysfs.o blk-mq-cpumap.o blk-mq-sched.o ioctl.o \
genhd.o partition-generic.o ioprio.o \
-   badblocks.o partitions/ blk-rq-qos.o
+   badblocks.o partitions/ blk-rq-qos.o \
+   blk-sec-buf.o
 
 obj-$(CONFIG_BOUNCE)   += bounce.o
 obj-$(CONFIG_BLK_SCSI_REQUEST) += scsi_ioctl.o
diff --git a/block/blk-core.c b/block/blk-core.c
index cdfabc5646da..02fe17dd5e67 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -1079,6 +1079,8 @@ struct request_queue *blk_alloc_queue_node(gfp_t 
gfp_mask, int node_id,
if (blkcg_init_queue(q))
goto fail_ref;
 
+   mutex_init(>blk_sec_buf_slabs_mutex);
+
return q;
 
 fail_ref:
diff --git a/block/blk-sec-buf.c b/block/blk-sec-buf.c
new file mode 100644
index ..2842a913a3d1
--- /dev/null
+++ b/block/blk-sec-buf.c
@@ -0,0 +1,144 @@
+/*
+ * Sector size level IO buffer allocation helpers for less-than PAGE_SIZE
+ * allocation.
+ *
+ * Controllers may has DMA alignment requirement, meantime filesystem or
+ * other upper layer component may allocate IO buffer via slab and submit
+ * bio with this buffer directly. Then DMA alignment limit can't be
+ * repectected.
+ *
+ * Create DMA aligned slab, and allocate this less-than PAGE_SIZE IO buffer
+ * from the created slab for above users.
+ *
+ * Copyright (C) 2018 Ming Lei 
+ *
+ */
+#include 
+#include 
+
+static void __blk_destroy_sec_buf_slabs(struct blk_sec_buf_slabs *slabs)
+{
+   int i;
+
+   if (!slabs)
+   return;
+
+   for (i = 0; i < BLK_NR_SEC_BUF_SLAB; i++)
+   kmem_cache_destroy(slabs->slabs[i]);
+   kfree(slabs);
+}
+
+void blk_destroy_sec_buf_slabs(struct request_queue *q)
+{
+   mutex_lock(>blk_sec_buf_slabs_mutex);
+   if (q->sec_buf_slabs && !--q->sec_buf_slabs->ref_cnt) {
+   __blk_destroy_sec_buf_slabs(q->sec_buf_slabs);
+   q->sec_buf_slabs = NULL;
+   }
+   mutex_unlock(>blk_sec_buf_slabs_mutex);
+}
+EXPORT_SYMBOL_GPL(blk_destroy_sec_buf_slabs);
+
+int blk_create_sec_buf_slabs(char *name, struct request_queue *q)
+{
+   struct blk_sec_buf_slabs *slabs;
+   char *slab_name;
+   int i;
+   int nr_slabs = BLK_NR_SEC_BUF_SLAB;
+   int ret = -ENOMEM;
+
+   /* No need to create kmem_cache if kmalloc is fine */
+   if (!q || queue_dma_alignment(q) < ARCH_KMALLOC_MINALIGN)
+   return 0;
+
+   slab_name = kmalloc(strlen(name) + 5, GFP_KERNEL);
+   if (!slab_name)
+   return ret;
+
+   mutex_lock(>blk_sec_buf_slabs_mutex);
+   if (q->sec_buf_slabs) {
+   q->sec_buf_slabs->ref_cnt++;
+   ret = 0;
+   goto out;
+   }
+
+   slabs = kzalloc(sizeof(*slabs), GFP_KERNEL);
+   if (!slabs)
+   goto out;
+
+   for (i = 0; i < nr_slabs; i++) {
+   int size = (i == nr_slabs - 1) ? PAGE_SIZE - 512 : (i + 1) << 9;
+
+   sprintf(slab_name, "%s-%d", name, i);
+   slabs->slabs[i] = kmem_cache_create(slab_name, size,
+   queue_dma_alignment(q) + 1,
+   SLAB_PANIC, NULL);
+   if (!slabs->slabs[i])
+   goto fail;
+   }
+
+   slabs->ref_cnt = 1;
+   q->sec_buf_slabs = slabs;
+   ret = 0;
+   goto out;
+
+ fail:
+   __blk_destroy_sec_buf_slabs(slabs);
+ out:
+   mutex_unlock(>blk_sec_buf_slabs_mutex);
+   kfree(slab_name);
+   return ret;
+}
+EXPORT_SYMBOL_GPL(blk_create_sec_buf_slabs);
+
+void *blk_alloc_sec_buf(struct request_queue *q, int size, gfp_t flags)
+{
+   int i;
+
+   /* We only serve less-than PAGE_SIZE