Re: [PATCH] sgtable over sglist (Re: [RFC 4/8] scsi-ml: scsi_sgtable implementation)
Comments about this patch embedded inside FUJITA Tomonori wrote: I've attached the sgtable patch for review. It's against the sglist-arch branch in Jens' tree. --- From: FUJITA Tomonori [EMAIL PROTECTED] Subject: [PATCH] move all the I/O descriptors to a new scsi_sgtable structure based on Boaz Harrosh [EMAIL PROTECTED] scsi_sgtable patch. Signed-off-by: FUJITA Tomonori [EMAIL PROTECTED] --- drivers/scsi/scsi_lib.c | 92 +++-- include/scsi/scsi_cmnd.h | 39 +-- 2 files changed, 82 insertions(+), 49 deletions(-) diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c index 5fb1048..2fa1852 100644 --- a/drivers/scsi/scsi_lib.c +++ b/drivers/scsi/scsi_lib.c @@ -52,6 +52,8 @@ static struct scsi_host_sg_pool scsi_sg_pools[] = { }; #undef SP +static struct kmem_cache *scsi_sgtable_cache; + One more slab pool to do regular IO static void scsi_run_queue(struct request_queue *q); /* @@ -731,16 +733,27 @@ static inline unsigned int scsi_sgtable_index(unsigned short nents) return index; } -struct scatterlist *scsi_alloc_sgtable(struct scsi_cmnd *cmd, gfp_t gfp_mask) +struct scsi_sgtable *scsi_alloc_sgtable(struct scsi_cmnd *cmd, gfp_t gfp_mask, + int sg_count) { struct scsi_host_sg_pool *sgp; struct scatterlist *sgl, *prev, *ret; + struct scsi_sgtable *sgt; unsigned int index; int this, left; - BUG_ON(!cmd-use_sg); + sgt = kmem_cache_zalloc(scsi_sgtable_cache, gfp_mask); + if (!sgt) + return NULL; One more allocation that can fail for every io. Even if we have a scsi_cmnd. + + /* + * don't allow subsequent mempool allocs to sleep, it would + * violate the mempool principle. + */ + gfp_mask = ~__GFP_WAIT; + gfp_mask |= __GFP_HIGH; We used to sometime wait for that Large 128 scatterlist full page. But now this small allocation is probably good. but we no longer wait for the big allocation below. - left = cmd-use_sg; + left = sg_count; ret = prev = NULL; do { this = left; @@ -764,7 +777,7 @@ struct scatterlist *scsi_alloc_sgtable(struct scsi_cmnd *cmd, gfp_t gfp_mask) * first loop through, set initial index and return value */ if (!ret) { - cmd-sglist_len = index; + sgt-sglist_len = index; ret = sgl; } @@ -776,21 +789,18 @@ struct scatterlist *scsi_alloc_sgtable(struct scsi_cmnd *cmd, gfp_t gfp_mask) if (prev) sg_chain(prev, SCSI_MAX_SG_SEGMENTS, sgl); - /* - * don't allow subsequent mempool allocs to sleep, it would - * violate the mempool principle. - */ - gfp_mask = ~__GFP_WAIT; - gfp_mask |= __GFP_HIGH; prev = sgl; } while (left); /* - * -use_sg may get modified after dma mapping has potentially + * -sg_count may get modified after dma mapping has potentially * shrunk the number of segments, so keep a copy of it for free. */ - cmd-__use_sg = cmd-use_sg; - return ret; + sgt-sg_count = sg_count; + sgt-__sg_count = sg_count; + sgt-sglist = ret; + cmd-sgtable = sgt; We used to set that in scsi_init_io. So this scsi_alloc_sgtable() can be called twice for bidi. Now we can not. Also the API change is not friendly for bidi and will have to change. (It used to be good) If you set it here why do you return it. + return sgt; enomem: if (ret) { /* @@ -809,6 +819,8 @@ enomem: mempool_free(prev, sgp-pool); } + kmem_cache_free(scsi_sgtable_cache, sgt); + return NULL; } @@ -816,21 +828,22 @@ EXPORT_SYMBOL(scsi_alloc_sgtable); void scsi_free_sgtable(struct scsi_cmnd *cmd) { - struct scatterlist *sgl = cmd-request_buffer; + struct scsi_sgtable *sgt = cmd-sgtable; + struct scatterlist *sgl = sgt-sglist; struct scsi_host_sg_pool *sgp; - BUG_ON(cmd-sglist_len = SG_MEMPOOL_NR); + BUG_ON(sgt-sglist_len = SG_MEMPOOL_NR); /* * if this is the biggest size sglist, check if we have * chained parts we need to free */ - if (cmd-__use_sg SCSI_MAX_SG_SEGMENTS) { + if (sgt-__sg_count SCSI_MAX_SG_SEGMENTS) { unsigned short this, left; struct scatterlist *next; unsigned int index; - left = cmd-__use_sg - (SCSI_MAX_SG_SEGMENTS - 1); + left = sgt-__sg_count - (SCSI_MAX_SG_SEGMENTS - 1); next = sg_chain_ptr(sgl[SCSI_MAX_SG_SEGMENTS - 1]); while (left next) { sgl = next; @@ -854,11 +867,12 @@ void
[PATCH] sgtable over sglist (Re: [RFC 4/8] scsi-ml: scsi_sgtable implementation)
From: Jens Axboe [EMAIL PROTECTED] Subject: Re: [RFC 4/8] scsi-ml: scsi_sgtable implementation Date: Wed, 18 Jul 2007 22:17:10 +0200 On Wed, Jul 18 2007, Benny Halevy wrote: Jens Axboe wrote: On Wed, Jul 18 2007, Boaz Harrosh wrote: Jens Axboe wrote: On Wed, Jul 18 2007, Boaz Harrosh wrote: FUJITA Tomonori wrote: From: Mike Christie [EMAIL PROTECTED] Subject: Re: [RFC 4/8] scsi-ml: scsi_sgtable implementation Date: Thu, 12 Jul 2007 14:09:44 -0500 Boaz Harrosh wrote: +/* + * Should fit within a single page. + */ +enum { SCSI_MAX_SG_SEGMENTS = + ((PAGE_SIZE - sizeof(struct scsi_sgtable)) / + sizeof(struct scatterlist)) }; + +enum { SG_MEMPOOL_NR = + (SCSI_MAX_SG_SEGMENTS = 7) + + (SCSI_MAX_SG_SEGMENTS = 15) + + (SCSI_MAX_SG_SEGMENTS = 31) + + (SCSI_MAX_SG_SEGMENTS = 63) + + (SCSI_MAX_SG_SEGMENTS = 127) + + (SCSI_MAX_SG_SEGMENTS = 255) + + (SCSI_MAX_SG_SEGMENTS = 511) +}; What does SCSI_MAX_SG_SEGMENTS end up being on x86 now? On x86_64 or some other arch, we were going over a page when doing SCSI_MAX_PHYS_SEGMENTS of 256 right? Seems that 170 with x86 and 127 with x86_64. with scsi_sgtable we get one less than now Arch | SCSI_MAX_SG_SEGMENTS = | sizeof(struct scatterlist) --|-|--- x86_64| 127 |32 i386 CONFIG_HIGHMEM64G=y | 204 |20 i386 other| 255 |16 What's nice about this code is that now finally it is automatically calculated in compile time. Arch people don't have the headache did I break SCSI-ml?. For example observe the current bug with i386 CONFIG_HIGHMEM64G=y. The same should be done with BIO's. Than ARCHs with big pages can gain even more. What happened to Jens's scatter list chaining and how does this relate to it then? With Jens' sglist, we can set SCSI_MAX_SG_SEGMENTS to whatever we want. We can remove the above code. We need to push this and Jens' sglist together in one merge window, I think. No Tomo the above does not go away. What goes away is maybe: It does go away, since we can just set it to some safe value and use chaining to get us where we want. In my patches SCSI_MAX_PHYS_SEGMENTS has went away it does not exist anymore. Sure, I could just kill it as well. The point is that it's a parallel development, there's nothing in your patch that helps the sg chaining whatsoever. The only complex thing in the SCSI layer for sg chaining, is chaining when allocating and walking that chain on freeing. That's it! It seems like having the pool index in the sgtable structure simplifies the implementation a bit for allocation and freeing of linked sgtables. Boaz will send an example tomorrow (hopefully) showing how the merged code looks like. The index stuff isn't complex, so I don't think you can call that a real simplification. It's not for free either, there's a size cost to pay. I think that the main issue of integrating sgtable and sglist is how to put scatterlist to scsi_sgtable structure. If we allocate a scsi_sgtable structure and sglists separately, the code is pretty simple. But probably it's not the best way from the perspective of performance. If we put sglists into the scsi_sgtable structure (like Boaz's patch does) and allocate and chain sglists only for large I/Os, we would have the better performance (especially for small I/Os). But we will have more complicated code. I wrote my sgtable patch over Jens' sglist with the former way: master.kernel.org:/pub/scm/linux/kernel/git/tomo/linux-2.6-bidi.git sgtable I also put Boaz's scsi_error, sd, and sr sgtable patches into the tree so you can try it. I've tested this with only normal size I/Os (not tested sglist code). I don't touch the sglist code much, so hopefully it works. I've attached the sgtable patch for review. It's against the sglist-arch branch in Jens' tree. --- From: FUJITA Tomonori [EMAIL PROTECTED] Subject: [PATCH] move all the I/O descriptors to a new scsi_sgtable structure based on Boaz Harrosh [EMAIL PROTECTED] scsi_sgtable patch. Signed-off-by: FUJITA Tomonori [EMAIL PROTECTED] --- drivers/scsi/scsi_lib.c | 92 +++-- include/scsi/scsi_cmnd.h | 39 +-- 2 files changed, 82 insertions(+), 49 deletions(-) diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c index 5fb1048..2fa1852 100644 --- a/drivers/scsi/scsi_lib.c +++ b/drivers/scsi/scsi_lib.c @@ -52,6 +52,8 @@ static struct scsi_host_sg_pool scsi_sg_pools[] = { }; #undef SP +static struct kmem_cache *scsi_sgtable_cache; + static void scsi_run_queue(struct request_queue *q); /* @@ -731,16 +733,27 @@ static inline unsigned int scsi_sgtable_index(unsigned