Re: [PATCH] sgtable over sglist (Re: [RFC 4/8] scsi-ml: scsi_sgtable implementation)

2007-07-25 Thread Boaz Harrosh
Comments about this patch embedded inside

FUJITA Tomonori wrote:
 
 I've attached the sgtable patch for review. It's against the
 sglist-arch branch in Jens' tree.
 
 ---
 From: FUJITA Tomonori [EMAIL PROTECTED]
 Subject: [PATCH] move all the I/O descriptors to a new scsi_sgtable structure
 
 based on Boaz Harrosh [EMAIL PROTECTED] scsi_sgtable patch.
 
 Signed-off-by: FUJITA Tomonori [EMAIL PROTECTED]
 ---
  drivers/scsi/scsi_lib.c  |   92 +++--
  include/scsi/scsi_cmnd.h |   39 +--
  2 files changed, 82 insertions(+), 49 deletions(-)
 
 diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
 index 5fb1048..2fa1852 100644
 --- a/drivers/scsi/scsi_lib.c
 +++ b/drivers/scsi/scsi_lib.c
 @@ -52,6 +52,8 @@ static struct scsi_host_sg_pool scsi_sg_pools[] = {
  };
  #undef SP
  
 +static struct kmem_cache *scsi_sgtable_cache;
 +
One more slab pool to do regular IO

  static void scsi_run_queue(struct request_queue *q);
  
  /*
 @@ -731,16 +733,27 @@ static inline unsigned int scsi_sgtable_index(unsigned 
 short nents)
   return index;
  }
  
 -struct scatterlist *scsi_alloc_sgtable(struct scsi_cmnd *cmd, gfp_t gfp_mask)
 +struct scsi_sgtable *scsi_alloc_sgtable(struct scsi_cmnd *cmd, gfp_t 
 gfp_mask,
 + int sg_count)
  {
   struct scsi_host_sg_pool *sgp;
   struct scatterlist *sgl, *prev, *ret;
 + struct scsi_sgtable *sgt;
   unsigned int index;
   int this, left;
  
 - BUG_ON(!cmd-use_sg);
 + sgt = kmem_cache_zalloc(scsi_sgtable_cache, gfp_mask);
 + if (!sgt)
 + return NULL;
One more allocation that can fail for every io. Even if we have
a scsi_cmnd.

 +
 + /*
 +  * don't allow subsequent mempool allocs to sleep, it would
 +  * violate the mempool principle.
 +  */
 + gfp_mask = ~__GFP_WAIT;
 + gfp_mask |= __GFP_HIGH;
We used to sometime wait for that Large 128 scatterlist full page.
But now this small allocation is probably good. but we no longer
wait for the big allocation below.

  
 - left = cmd-use_sg;
 + left = sg_count;
   ret = prev = NULL;
   do {
   this = left;
 @@ -764,7 +777,7 @@ struct scatterlist *scsi_alloc_sgtable(struct scsi_cmnd 
 *cmd, gfp_t gfp_mask)
* first loop through, set initial index and return value
*/
   if (!ret) {
 - cmd-sglist_len = index;
 + sgt-sglist_len = index;
   ret = sgl;
   }
  
 @@ -776,21 +789,18 @@ struct scatterlist *scsi_alloc_sgtable(struct scsi_cmnd 
 *cmd, gfp_t gfp_mask)
   if (prev)
   sg_chain(prev, SCSI_MAX_SG_SEGMENTS, sgl);
  
 - /*
 -  * don't allow subsequent mempool allocs to sleep, it would
 -  * violate the mempool principle.
 -  */
 - gfp_mask = ~__GFP_WAIT;
 - gfp_mask |= __GFP_HIGH;
   prev = sgl;
   } while (left);
  
   /*
 -  * -use_sg may get modified after dma mapping has potentially
 +  * -sg_count may get modified after dma mapping has potentially
* shrunk the number of segments, so keep a copy of it for free.
*/
 - cmd-__use_sg = cmd-use_sg;
 - return ret;
 + sgt-sg_count = sg_count;
 + sgt-__sg_count = sg_count;
 + sgt-sglist = ret;
 + cmd-sgtable = sgt;
We used to set that in scsi_init_io. So this scsi_alloc_sgtable()
can be called twice for bidi. Now we can not. Also the API change
is not friendly for bidi and will have to change. (It used to be good)
If you set it here why do you return it.

 + return sgt;
  enomem:
   if (ret) {
   /*
 @@ -809,6 +819,8 @@ enomem:
  
   mempool_free(prev, sgp-pool);
   }
 + kmem_cache_free(scsi_sgtable_cache, sgt);
 +
   return NULL;
  }
  
 @@ -816,21 +828,22 @@ EXPORT_SYMBOL(scsi_alloc_sgtable);
  
  void scsi_free_sgtable(struct scsi_cmnd *cmd)
  {
 - struct scatterlist *sgl = cmd-request_buffer;
 + struct scsi_sgtable *sgt = cmd-sgtable;
 + struct scatterlist *sgl = sgt-sglist;
   struct scsi_host_sg_pool *sgp;
  
 - BUG_ON(cmd-sglist_len = SG_MEMPOOL_NR);
 + BUG_ON(sgt-sglist_len = SG_MEMPOOL_NR);
  
   /*
* if this is the biggest size sglist, check if we have
* chained parts we need to free
*/
 - if (cmd-__use_sg  SCSI_MAX_SG_SEGMENTS) {
 + if (sgt-__sg_count  SCSI_MAX_SG_SEGMENTS) {
   unsigned short this, left;
   struct scatterlist *next;
   unsigned int index;
  
 - left = cmd-__use_sg - (SCSI_MAX_SG_SEGMENTS - 1);
 + left = sgt-__sg_count - (SCSI_MAX_SG_SEGMENTS - 1);
   next = sg_chain_ptr(sgl[SCSI_MAX_SG_SEGMENTS - 1]);
   while (left  next) {
   sgl = next;
 @@ -854,11 +867,12 @@ void 

[PATCH] sgtable over sglist (Re: [RFC 4/8] scsi-ml: scsi_sgtable implementation)

2007-07-23 Thread FUJITA Tomonori
From: Jens Axboe [EMAIL PROTECTED]
Subject: Re: [RFC 4/8] scsi-ml: scsi_sgtable implementation
Date: Wed, 18 Jul 2007 22:17:10 +0200

 On Wed, Jul 18 2007, Benny Halevy wrote:
  Jens Axboe wrote:
   On Wed, Jul 18 2007, Boaz Harrosh wrote:
   Jens Axboe wrote:
   On Wed, Jul 18 2007, Boaz Harrosh wrote:
   FUJITA Tomonori wrote:
   From: Mike Christie [EMAIL PROTECTED]
   Subject: Re: [RFC 4/8] scsi-ml: scsi_sgtable implementation
   Date: Thu, 12 Jul 2007 14:09:44 -0500
  
   Boaz Harrosh wrote:
   +/*
   + * Should fit within a single page.
   + */
   +enum { SCSI_MAX_SG_SEGMENTS =
   +   ((PAGE_SIZE - sizeof(struct scsi_sgtable)) /
   +   sizeof(struct scatterlist)) };
   +
   +enum { SG_MEMPOOL_NR =
   +   (SCSI_MAX_SG_SEGMENTS = 7) +
   +   (SCSI_MAX_SG_SEGMENTS = 15) +
   +   (SCSI_MAX_SG_SEGMENTS = 31) +
   +   (SCSI_MAX_SG_SEGMENTS = 63) +
   +   (SCSI_MAX_SG_SEGMENTS = 127) +
   +   (SCSI_MAX_SG_SEGMENTS = 255) +
   +   (SCSI_MAX_SG_SEGMENTS = 511)
   +};

   What does SCSI_MAX_SG_SEGMENTS end up being on x86 now? On x86_64 or 
   some other arch, we were going over a page when doing 
   SCSI_MAX_PHYS_SEGMENTS of 256 right?
   Seems that 170 with x86 and 127 with x86_64.
  
   with scsi_sgtable we get one less than now
  
   Arch  | SCSI_MAX_SG_SEGMENTS =  | sizeof(struct 
   scatterlist)
   --|-|---
   x86_64| 127 |32
   i386 CONFIG_HIGHMEM64G=y  | 204 |20
   i386 other| 255 |16
  
   What's nice about this code is that now finally it is
   automatically calculated in compile time. Arch people
   don't have the headache did I break SCSI-ml?. 
   For example observe the current bug with i386 
   CONFIG_HIGHMEM64G=y.
  
   The same should be done with BIO's. Than ARCHs with big
   pages can gain even more.
  
   What happened to Jens's scatter list chaining and how does this 
   relate 
   to it then?
   With Jens' sglist, we can set SCSI_MAX_SG_SEGMENTS to whatever we
   want. We can remove the above code.
  
   We need to push this and Jens' sglist together in one merge window, I
   think.
   No Tomo the above does not go away. What goes away is maybe:
   It does go away, since we can just set it to some safe value and use
   chaining to get us where we want.
   In my patches SCSI_MAX_PHYS_SEGMENTS has went away it does not exist
   anymore.
   
   Sure, I could just kill it as well. The point is that it's a parallel
   development, there's nothing in your patch that helps the sg chaining
   whatsoever. The only complex thing in the SCSI layer for sg chaining,
   is chaining when allocating and walking that chain on freeing. That's
   it!
  
  It seems like having the pool index in the sgtable structure simplifies
  the implementation a bit for allocation and freeing of linked sgtables.
  Boaz will send an example tomorrow (hopefully) showing how the merged
  code looks like.
 
 The index stuff isn't complex, so I don't think you can call that a real
 simplification. It's not for free either, there's a size cost to pay.

I think that the main issue of integrating sgtable and sglist is how
to put scatterlist to scsi_sgtable structure.

If we allocate a scsi_sgtable structure and sglists separately, the
code is pretty simple. But probably it's not the best way from the
perspective of performance.

If we put sglists into the scsi_sgtable structure (like Boaz's patch
does) and allocate and chain sglists only for large I/Os, we would
have the better performance (especially for small I/Os). But we will
have more complicated code.

I wrote my sgtable patch over Jens' sglist with the former way:

master.kernel.org:/pub/scm/linux/kernel/git/tomo/linux-2.6-bidi.git sgtable


I also put Boaz's scsi_error, sd, and sr sgtable patches into the tree
so you can try it. I've tested this with only normal size I/Os (not
tested sglist code). I don't touch the sglist code much, so hopefully
it works.

I've attached the sgtable patch for review. It's against the
sglist-arch branch in Jens' tree.

---
From: FUJITA Tomonori [EMAIL PROTECTED]
Subject: [PATCH] move all the I/O descriptors to a new scsi_sgtable structure

based on Boaz Harrosh [EMAIL PROTECTED] scsi_sgtable patch.

Signed-off-by: FUJITA Tomonori [EMAIL PROTECTED]
---
 drivers/scsi/scsi_lib.c  |   92 +++--
 include/scsi/scsi_cmnd.h |   39 +--
 2 files changed, 82 insertions(+), 49 deletions(-)

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 5fb1048..2fa1852 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -52,6 +52,8 @@ static struct scsi_host_sg_pool scsi_sg_pools[] = {
 };
 #undef SP
 
+static struct kmem_cache *scsi_sgtable_cache;
+
 static void scsi_run_queue(struct request_queue *q);
 
 /*
@@ -731,16 +733,27 @@ static inline unsigned int scsi_sgtable_index(unsigned