Re: [PATCH 4/5] block: introduce helpers for allocating IO buffers from slab
On Fri, Oct 19, 2018 at 10:53:49AM +0800, Ming Lei wrote: > On Thu, Oct 18, 2018 at 05:22:19PM +0200, Christoph Hellwig wrote: > > On Thu, Oct 18, 2018 at 08:11:23AM -0700, Matthew Wilcox wrote: > > > On Thu, Oct 18, 2018 at 04:42:07PM +0200, Christoph Hellwig wrote: > > > > This all seems quite complicated. > > > > > > > > I think the interface we'd want is more one that has a little > > > > cache of a single page in the queue, and a little bitmap which > > > > sub-page size blocks of it are used. > > > > > > > > Something like (pseudo code minus locking): > > > > > > > > void *blk_alloc_sector_buffer(struct block_device *bdev, gfp_t gfp) > > > > { > > > > unsigned block_size = block_size(bdev); > > > > > > > > if (blocksize >= PAGE_SIZE) > > > > return (void *)__get_free_pages(gfp, > > > > get_order(blocksize)); > > > > > > > > if (bdev->fragment_cache_page) { > > > > [ fragment_cache_page using > > > > e.g. bitmap and return if found] > > > > } > > > > > > > > bdev->fragment_cache_page = (void *)__get_free_page(gfp); > > > > goto find_again; > > > > } > > > > > > This looks a lot like page_frag_alloc() except I think page_frag_alloc() > > > may be more efficient. > > > > Oh, nice. Sounds like XFS should just use page_frag_alloc. I'll give > > it a spin. > > XFS or other fs can use page_frag_alloc() directly, seems not necessary to > introduce this change in block layer any more given 512-aligned buffer > should be fine everywhere. > > The only benefit to make it as block helper is that the offset or size > can be checked with q->dma_alignment. > > Dave/Jens, do you think which way is better? Put allocation as block > helper or fs uses page_frag_alloc() directly for allocating 512*N-byte > buffer(total size is less than PAGE_SIZE)? Cristoph has already said he's looking at using page_frag_alloc() directly in XFS Cheers, Dave. -- Dave Chinner da...@fromorbit.com
Re: [PATCH 4/5] block: introduce helpers for allocating IO buffers from slab
On 10/18/18 8:53 PM, Ming Lei wrote: > On Thu, Oct 18, 2018 at 05:22:19PM +0200, Christoph Hellwig wrote: >> On Thu, Oct 18, 2018 at 08:11:23AM -0700, Matthew Wilcox wrote: >>> On Thu, Oct 18, 2018 at 04:42:07PM +0200, Christoph Hellwig wrote: This all seems quite complicated. I think the interface we'd want is more one that has a little cache of a single page in the queue, and a little bitmap which sub-page size blocks of it are used. Something like (pseudo code minus locking): void *blk_alloc_sector_buffer(struct block_device *bdev, gfp_t gfp) { unsigned block_size = block_size(bdev); if (blocksize >= PAGE_SIZE) return (void *)__get_free_pages(gfp, get_order(blocksize)); if (bdev->fragment_cache_page) { [ fragment_cache_page using e.g. bitmap and return if found] } bdev->fragment_cache_page = (void *)__get_free_page(gfp); goto find_again; } >>> >>> This looks a lot like page_frag_alloc() except I think page_frag_alloc() >>> may be more efficient. >> >> Oh, nice. Sounds like XFS should just use page_frag_alloc. I'll give >> it a spin. > > XFS or other fs can use page_frag_alloc() directly, seems not necessary to > introduce this change in block layer any more given 512-aligned buffer > should be fine everywhere. > > The only benefit to make it as block helper is that the offset or size > can be checked with q->dma_alignment. > > Dave/Jens, do you think which way is better? Put allocation as block > helper or fs uses page_frag_alloc() directly for allocating 512*N-byte > buffer(total size is less than PAGE_SIZE)? I'd greatly prefer having the FS use that directly, seems kind of pointless to provide an abstraction for that at that point. -- Jens Axboe
Re: [PATCH 4/5] block: introduce helpers for allocating IO buffers from slab
On Thu, Oct 18, 2018 at 05:22:19PM +0200, Christoph Hellwig wrote: > On Thu, Oct 18, 2018 at 08:11:23AM -0700, Matthew Wilcox wrote: > > On Thu, Oct 18, 2018 at 04:42:07PM +0200, Christoph Hellwig wrote: > > > This all seems quite complicated. > > > > > > I think the interface we'd want is more one that has a little > > > cache of a single page in the queue, and a little bitmap which > > > sub-page size blocks of it are used. > > > > > > Something like (pseudo code minus locking): > > > > > > void *blk_alloc_sector_buffer(struct block_device *bdev, gfp_t gfp) > > > { > > > unsigned block_size = block_size(bdev); > > > > > > if (blocksize >= PAGE_SIZE) > > > return (void *)__get_free_pages(gfp, get_order(blocksize)); > > > > > > if (bdev->fragment_cache_page) { > > > [ fragment_cache_page using > > > e.g. bitmap and return if found] > > > } > > > > > > bdev->fragment_cache_page = (void *)__get_free_page(gfp); > > > goto find_again; > > > } > > > > This looks a lot like page_frag_alloc() except I think page_frag_alloc() > > may be more efficient. > > Oh, nice. Sounds like XFS should just use page_frag_alloc. I'll give > it a spin. XFS or other fs can use page_frag_alloc() directly, seems not necessary to introduce this change in block layer any more given 512-aligned buffer should be fine everywhere. The only benefit to make it as block helper is that the offset or size can be checked with q->dma_alignment. Dave/Jens, do you think which way is better? Put allocation as block helper or fs uses page_frag_alloc() directly for allocating 512*N-byte buffer(total size is less than PAGE_SIZE)? Thanks, Ming
Re: [PATCH 4/5] block: introduce helpers for allocating IO buffers from slab
On Thu, Oct 18, 2018 at 08:11:23AM -0700, Matthew Wilcox wrote: > On Thu, Oct 18, 2018 at 04:42:07PM +0200, Christoph Hellwig wrote: > > This all seems quite complicated. > > > > I think the interface we'd want is more one that has a little > > cache of a single page in the queue, and a little bitmap which > > sub-page size blocks of it are used. > > > > Something like (pseudo code minus locking): > > > > void *blk_alloc_sector_buffer(struct block_device *bdev, gfp_t gfp) > > { > > unsigned block_size = block_size(bdev); > > > > if (blocksize >= PAGE_SIZE) > > return (void *)__get_free_pages(gfp, get_order(blocksize)); > > > > if (bdev->fragment_cache_page) { > > [ fragment_cache_page using > > e.g. bitmap and return if found] > > } > > > > bdev->fragment_cache_page = (void *)__get_free_page(gfp); > > goto find_again; > > } > > This looks a lot like page_frag_alloc() except I think page_frag_alloc() > may be more efficient. Oh, nice. Sounds like XFS should just use page_frag_alloc. I'll give it a spin.
Re: [PATCH 4/5] block: introduce helpers for allocating IO buffers from slab
On Thu, Oct 18, 2018 at 04:42:07PM +0200, Christoph Hellwig wrote: > This all seems quite complicated. > > I think the interface we'd want is more one that has a little > cache of a single page in the queue, and a little bitmap which > sub-page size blocks of it are used. > > Something like (pseudo code minus locking): > > void *blk_alloc_sector_buffer(struct block_device *bdev, gfp_t gfp) > { > unsigned block_size = block_size(bdev); > > if (blocksize >= PAGE_SIZE) > return (void *)__get_free_pages(gfp, get_order(blocksize)); > > if (bdev->fragment_cache_page) { > [ fragment_cache_page using > e.g. bitmap and return if found] > } > > bdev->fragment_cache_page = (void *)__get_free_page(gfp); > goto find_again; > } This looks a lot like page_frag_alloc() except I think page_frag_alloc() may be more efficient.
Re: [PATCH 4/5] block: introduce helpers for allocating IO buffers from slab
This all seems quite complicated. I think the interface we'd want is more one that has a little cache of a single page in the queue, and a little bitmap which sub-page size blocks of it are used. Something like (pseudo code minus locking): void *blk_alloc_sector_buffer(struct block_device *bdev, gfp_t gfp) { unsigned block_size = block_size(bdev); if (blocksize >= PAGE_SIZE) return (void *)__get_free_pages(gfp, get_order(blocksize)); if (bdev->fragment_cache_page) { [ fragment_cache_page using e.g. bitmap and return if found] } bdev->fragment_cache_page = (void *)__get_free_page(gfp); goto find_again; } note that the locking also is important, preferably we'd be able to do something lockless.
[PATCH 4/5] block: introduce helpers for allocating IO buffers from slab
One big issue is that the allocated buffer from slab has to respect the queue DMA alignment limit. This patch supports to create one kmem_cache for less-than PAGE_SIZE allocation, and makes sure that the allocation is aligned with queue DMA alignment. For >= PAGE_SIZE allocation, it should be done via buddy directly. Cc: Vitaly Kuznetsov Cc: Dave Chinner Cc: Linux FS Devel Cc: Darrick J. Wong Cc: x...@vger.kernel.org Cc: Dave Chinner Cc: Christoph Hellwig Cc: Bart Van Assche Cc: Matthew Wilcox Signed-off-by: Ming Lei --- block/Makefile | 3 +- block/blk-core.c| 2 + block/blk-sec-buf.c | 144 include/linux/blk-sec-buf.h | 43 + include/linux/blkdev.h | 5 ++ 5 files changed, 196 insertions(+), 1 deletion(-) create mode 100644 block/blk-sec-buf.c create mode 100644 include/linux/blk-sec-buf.h diff --git a/block/Makefile b/block/Makefile index 27eac600474f..74f3ed6ef954 100644 --- a/block/Makefile +++ b/block/Makefile @@ -9,7 +9,8 @@ obj-$(CONFIG_BLOCK) := bio.o elevator.o blk-core.o blk-tag.o blk-sysfs.o \ blk-lib.o blk-mq.o blk-mq-tag.o blk-stat.o \ blk-mq-sysfs.o blk-mq-cpumap.o blk-mq-sched.o ioctl.o \ genhd.o partition-generic.o ioprio.o \ - badblocks.o partitions/ blk-rq-qos.o + badblocks.o partitions/ blk-rq-qos.o \ + blk-sec-buf.o obj-$(CONFIG_BOUNCE) += bounce.o obj-$(CONFIG_BLK_SCSI_REQUEST) += scsi_ioctl.o diff --git a/block/blk-core.c b/block/blk-core.c index cdfabc5646da..02fe17dd5e67 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -1079,6 +1079,8 @@ struct request_queue *blk_alloc_queue_node(gfp_t gfp_mask, int node_id, if (blkcg_init_queue(q)) goto fail_ref; + mutex_init(>blk_sec_buf_slabs_mutex); + return q; fail_ref: diff --git a/block/blk-sec-buf.c b/block/blk-sec-buf.c new file mode 100644 index ..2842a913a3d1 --- /dev/null +++ b/block/blk-sec-buf.c @@ -0,0 +1,144 @@ +/* + * Sector size level IO buffer allocation helpers for less-than PAGE_SIZE + * allocation. + * + * Controllers may has DMA alignment requirement, meantime filesystem or + * other upper layer component may allocate IO buffer via slab and submit + * bio with this buffer directly. Then DMA alignment limit can't be + * repectected. + * + * Create DMA aligned slab, and allocate this less-than PAGE_SIZE IO buffer + * from the created slab for above users. + * + * Copyright (C) 2018 Ming Lei + * + */ +#include +#include + +static void __blk_destroy_sec_buf_slabs(struct blk_sec_buf_slabs *slabs) +{ + int i; + + if (!slabs) + return; + + for (i = 0; i < BLK_NR_SEC_BUF_SLAB; i++) + kmem_cache_destroy(slabs->slabs[i]); + kfree(slabs); +} + +void blk_destroy_sec_buf_slabs(struct request_queue *q) +{ + mutex_lock(>blk_sec_buf_slabs_mutex); + if (q->sec_buf_slabs && !--q->sec_buf_slabs->ref_cnt) { + __blk_destroy_sec_buf_slabs(q->sec_buf_slabs); + q->sec_buf_slabs = NULL; + } + mutex_unlock(>blk_sec_buf_slabs_mutex); +} +EXPORT_SYMBOL_GPL(blk_destroy_sec_buf_slabs); + +int blk_create_sec_buf_slabs(char *name, struct request_queue *q) +{ + struct blk_sec_buf_slabs *slabs; + char *slab_name; + int i; + int nr_slabs = BLK_NR_SEC_BUF_SLAB; + int ret = -ENOMEM; + + /* No need to create kmem_cache if kmalloc is fine */ + if (!q || queue_dma_alignment(q) < ARCH_KMALLOC_MINALIGN) + return 0; + + slab_name = kmalloc(strlen(name) + 5, GFP_KERNEL); + if (!slab_name) + return ret; + + mutex_lock(>blk_sec_buf_slabs_mutex); + if (q->sec_buf_slabs) { + q->sec_buf_slabs->ref_cnt++; + ret = 0; + goto out; + } + + slabs = kzalloc(sizeof(*slabs), GFP_KERNEL); + if (!slabs) + goto out; + + for (i = 0; i < nr_slabs; i++) { + int size = (i == nr_slabs - 1) ? PAGE_SIZE - 512 : (i + 1) << 9; + + sprintf(slab_name, "%s-%d", name, i); + slabs->slabs[i] = kmem_cache_create(slab_name, size, + queue_dma_alignment(q) + 1, + SLAB_PANIC, NULL); + if (!slabs->slabs[i]) + goto fail; + } + + slabs->ref_cnt = 1; + q->sec_buf_slabs = slabs; + ret = 0; + goto out; + + fail: + __blk_destroy_sec_buf_slabs(slabs); + out: + mutex_unlock(>blk_sec_buf_slabs_mutex); + kfree(slab_name); + return ret; +} +EXPORT_SYMBOL_GPL(blk_create_sec_buf_slabs); + +void *blk_alloc_sec_buf(struct request_queue *q, int size, gfp_t flags) +{ + int i; + + /* We only serve less-than PAGE_SIZE