[PATCH 0/2] (was Re: [ofa-general] fmr pool free_list empty)

2008-02-26 Thread Pete Wyckoff
[EMAIL PROTECTED] wrote on Mon, 25 Feb 2008 15:02 -0800:
 Ugh.
[pw wrote:]
   Looking at the FMR dirty list unmapping code in
   ib_fmr_batch_release(), there is a section that pulls all the dirty
   entries onto a list that it will later unmap and put back on the
   free list.
 
   But it also plans to unmap all the free entries that have ever been
   remapped:
 
 Yes, this came from a3cd7d90 (IB/fmr_pool: ib_fmr_pool_flush() should
 flush all dirty FMRs).  That solved a real problem for Olaf, because
 otherwise dirty FMRs with not at the max map count might never get
 invalidated.  It's not exactly an optimization but rather a
 correctness issue, because RDS relies on killing mapping eventually.
 
 On the other hand, this behavior clearly does lead to the possibility
 of leaving the free list temporarily empty for stupid reasons.
 
 I don't see a really good way to fix this at the momemnt, need to
 meditate a little.

Adding CCs in case some iser users are not on the openfabrics list.
Original message is here:
http://lists.openfabrics.org/pipermail/general/2008-February/047111.html

This quoted commit is a regression for iSER.  Not sure if it causes
problems for the other FMR user, SRP.  It went in after v2.6.24.
Following this mail are two patches.  One to revert the change, and
one to attempt to do Olaf's patch in such a way that it does not
cause problems for other FMR users.

I haven't tested the patches with RDS.  It apparently isn't in the
tree yet.  In fact, there are no users of ib_flush_fmr_pool() in the
tree, which is the only function affected by the second patch.  But
iSER is working again in my scenario.

As a side note, I don't remember seeing this patch on the
openfabrics mailing list.  Perhaps I missed it.  Sometimes these
sorts of interactions can be spotted if proposed changes get wider
attention.

-- Pete
-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2] Revert IB/fmr_pool: ib_fmr_pool_flush() should flush all dirty FMRs

2008-02-26 Thread Pete Wyckoff
This reverts commit a3cd7d9070be417a21905c997ee32d756d999b38.

The original commit breaks iSER reliably, making it complain:

iser: iser_reg_page_vec:ib_fmr_pool_map_phys failed: -11

The FMR cleanup thread runs ib_fmr_batch_release() as dirty
entries build up.  This commit causes clean but used FMR
entries also to be purged.  During that process, another thread
can see that there are no free FMRs and fail, even though
there should always have been enough available.

Signed-off-by: Pete Wyckoff [EMAIL PROTECTED]
---
 drivers/infiniband/core/fmr_pool.c |   21 ++---
 1 files changed, 6 insertions(+), 15 deletions(-)

diff --git a/drivers/infiniband/core/fmr_pool.c 
b/drivers/infiniband/core/fmr_pool.c
index 7f00347..4044fdf 100644
--- a/drivers/infiniband/core/fmr_pool.c
+++ b/drivers/infiniband/core/fmr_pool.c
@@ -139,7 +139,7 @@ static inline struct ib_pool_fmr 
*ib_fmr_cache_lookup(struct ib_fmr_pool *pool,
 static void ib_fmr_batch_release(struct ib_fmr_pool *pool)
 {
int ret;
-   struct ib_pool_fmr *fmr, *next;
+   struct ib_pool_fmr *fmr;
LIST_HEAD(unmap_list);
LIST_HEAD(fmr_list);
 
@@ -158,20 +158,6 @@ static void ib_fmr_batch_release(struct ib_fmr_pool *pool)
 #endif
}
 
-   /*
-* The free_list may hold FMRs that have been put there
-* because they haven't reached the max_remap count.
-* Invalidate their mapping as well.
-*/
-   list_for_each_entry_safe(fmr, next, pool-free_list, list) {
-   if (fmr-remap_count == 0)
-   continue;
-   hlist_del_init(fmr-cache_node);
-   fmr-remap_count = 0;
-   list_add_tail(fmr-fmr-list, fmr_list);
-   list_move(fmr-list, unmap_list);
-   }
-
list_splice(pool-dirty_list, unmap_list);
INIT_LIST_HEAD(pool-dirty_list);
pool-dirty_len = 0;
@@ -384,6 +370,11 @@ void ib_destroy_fmr_pool(struct ib_fmr_pool *pool)
 
i = 0;
list_for_each_entry_safe(fmr, tmp, pool-free_list, list) {
+   if (fmr-remap_count) {
+   INIT_LIST_HEAD(fmr_list);
+   list_add_tail(fmr-fmr-list, fmr_list);
+   ib_unmap_fmr(fmr_list);
+   }
ib_dealloc_fmr(fmr-fmr);
list_del(fmr-list);
kfree(fmr);
-- 
1.5.4.1

-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] ib fmr pool: flush used clean entries

2008-02-26 Thread Pete Wyckoff
Commit a3cd7d9070be417a21905c997ee32d756d999b38 (IB/fmr_pool:
ib_fmr_pool_flush() should flush all dirty FMRs) caused a
regression for iSER and was reverted in
e5507736c6449b3253347eed6f8ea77a28cf688e.

This change attempts to redo the original patch so that all used
FMR entries are flushed when ib_flush_fmr_pool() is called,
and other FMR users are not affected.  Simply move used entries
from the clean list onto the dirty list before letting the
cleanup thread do its job.

Signed-off-by: Pete Wyckoff [EMAIL PROTECTED]
---
 drivers/infiniband/core/fmr_pool.c |   17 -
 1 files changed, 16 insertions(+), 1 deletions(-)

diff --git a/drivers/infiniband/core/fmr_pool.c 
b/drivers/infiniband/core/fmr_pool.c
index 4044fdf..06d502c 100644
--- a/drivers/infiniband/core/fmr_pool.c
+++ b/drivers/infiniband/core/fmr_pool.c
@@ -398,8 +398,23 @@ EXPORT_SYMBOL(ib_destroy_fmr_pool);
  */
 int ib_flush_fmr_pool(struct ib_fmr_pool *pool)
 {
-   int serial = atomic_inc_return(pool-req_ser);
+   int serial;
+   struct ib_pool_fmr *fmr, *next;
+
+   /*
+* The free_list holds FMRs that may have been used
+* but have not been remapped enough times to be dirty.
+* Put them on the dirty list now so that the cleanup
+* thread will reap them too.
+*/
+   spin_lock_irq(pool-pool_lock);
+   list_for_each_entry_safe(fmr, next, pool-free_list, list) {
+   if (fmr-remap_count  0)
+   list_move(fmr-list, pool-dirty_list);
+   }
+   spin_unlock_irq(pool-pool_lock);
 
+   serial = atomic_inc_return(pool-req_ser);
wake_up_process(pool-thread);
 
if (wait_event_interruptible(pool-force_wait,
-- 
1.5.4.1

-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/3] iscsi bidi varlen support

2008-02-18 Thread Pete Wyckoff
[EMAIL PROTECTED] wrote on Mon, 18 Feb 2008 17:39 +0200:
 On Tue, Feb 12 2008 at 22:17 +0200, Pete Wyckoff [EMAIL PROTECTED] wrote:
  From: Pete Wyckoff [EMAIL PROTECTED]
  Subject: [PATCH] iscsi iser: varlen
  
  Handle variable-length CDBs in iSER.
  
  Signed-off-by: Pete Wyckoff [EMAIL PROTECTED]
  ---
   drivers/infiniband/ulp/iser/iscsi_iser.c |5 +++--
   drivers/infiniband/ulp/iser/iscsi_iser.h |2 +-
   drivers/infiniband/ulp/iser/iser_initiator.c |   16 ++--
   3 files changed, 14 insertions(+), 9 deletions(-)
  
  diff --git a/drivers/infiniband/ulp/iser/iscsi_iser.c 
  b/drivers/infiniband/ulp/iser/iscsi_iser.c
  index 5f2284d..9dfc310 100644
  --- a/drivers/infiniband/ulp/iser/iscsi_iser.c
  +++ b/drivers/infiniband/ulp/iser/iscsi_iser.c
  @@ -401,7 +401,8 @@ iscsi_iser_session_create(struct iscsi_transport 
  *iscsit,
  ctask  = session-cmds[i];
  iser_ctask = ctask-dd_data;
  ctask-hdr = (struct iscsi_cmd *)iser_ctask-desc.iscsi_header;
  -   ctask-hdr_max = sizeof(iser_ctask-desc.iscsi_header);
  +   ctask-hdr_max = sizeof(iser_ctask-desc.iscsi_header) +
  +sizeof(iser_ctask-desc.hdrextbuf);
  }
   
  for (i = 0; i  session-mgmtpool_max; i++) {
  @@ -604,7 +605,7 @@ static struct iscsi_transport iscsi_iser_transport = {
  .host_template  = iscsi_iser_sht,
  .conndata_size  = sizeof(struct iscsi_conn),
  .max_lun= ISCSI_ISER_MAX_LUN,
  -   .max_cmd_len= ISCSI_ISER_MAX_CMD_LEN,
  +   .max_cmd_len= 260,
 
 Same bug I had. .max_cmd_len is still char, before the varlen patch to 
 scsi-ml.
 So it must be at most 252, Until that patch is introduced and it can return to
 the correct 260 or better yet SCSI_MAX_VARLEN_CDB_SIZE. That also is only 
 defined in the scsi-ml varlen patch.

Ah, that is unfortunate.

 I'm afraid the varlen patches to block and scsi-ml are waiting because of
 me. There are more things I need to check, before they can get approved.
 
 Once I do that, and varlen gets accepted, iSER and iscsi_tcp can go up 
 to 260 for the .max_cmd_len as they should.

I will sit on these iser changes until we get core varlen resolved,
then.  Or you can just sequence it all cleverly through the various
maintainers.

-- Pete
-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/3 v2] iscsi iser: remove DMA alignment restriction

2008-02-14 Thread Pete Wyckoff
Thanks to James pointing out the problems with the BLK_BOUNCE_ANY
for IB devices, this revised patch contains only the DMA alignment
fix for iSER.

Mike, can you take care of this and the other two patches in the
series:

[PATCH 2/3] iscsi iser: increase max_sectors
[PATCH 3/3] iscsi iser: increase sg_tablesize

-- Pete


From 255e73b67ec1458af18395981fddebdc958e8fe9 Mon Sep 17 00:00:00 2001
From: Pete Wyckoff [EMAIL PROTECTED]
Date: Thu, 14 Feb 2008 16:09:27 -0500
Subject: [PATCH] iscsi iser: remove DMA alignment restriction

iscsi_iser does not require any particular DMA aligement
requirement.  Add a slave_configure function to set the alignment
to zero, allowing the use of direct IO from arbitrary offsets
within a page.

Signed-off-by: Pete Wyckoff [EMAIL PROTECTED]
---
 drivers/infiniband/ulp/iser/iscsi_iser.c |7 +++
 1 files changed, 7 insertions(+), 0 deletions(-)

diff --git a/drivers/infiniband/ulp/iser/iscsi_iser.c 
b/drivers/infiniband/ulp/iser/iscsi_iser.c
index be1b9fb..313f102 100644
--- a/drivers/infiniband/ulp/iser/iscsi_iser.c
+++ b/drivers/infiniband/ulp/iser/iscsi_iser.c
@@ -543,6 +543,12 @@ iscsi_iser_ep_disconnect(__u64 ep_handle)
iser_conn_terminate(ib_conn);
 }
 
+static int iscsi_iser_slave_configure(struct scsi_device *sdev)
+{
+   blk_queue_dma_alignment(sdev-request_queue, 0);
+   return 0;
+}
+
 static struct scsi_host_template iscsi_iser_sht = {
.module = THIS_MODULE,
.name   = iSCSI Initiator over iSER, v. DRV_VER,
@@ -556,6 +562,7 @@ static struct scsi_host_template iscsi_iser_sht = {
.eh_device_reset_handler= iscsi_eh_device_reset,
.eh_host_reset_handler  = iscsi_eh_host_reset,
.use_clustering = DISABLE_CLUSTERING,
+   .slave_configure= iscsi_iser_slave_configure,
.proc_name  = iscsi_iser,
.this_id= -1,
 };
-- 
1.5.3.8

-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] iscsi iser: remove DMA restrictions

2008-02-13 Thread Pete Wyckoff
[EMAIL PROTECTED] wrote on Tue, 12 Feb 2008 15:57 -0600:
 On Tue, 2008-02-12 at 16:46 -0500, Pete Wyckoff wrote:
  [EMAIL PROTECTED] wrote on Tue, 12 Feb 2008 15:10 -0600:
   On Tue, 2008-02-12 at 15:54 -0500, Pete Wyckoff wrote:
iscsi_iser does not have any hardware DMA restrictions.  Add a
slave_configure function to remove any DMA alignment restriction,
allowing the use of direct IO from arbitrary offsets within a page.
Also disable page bouncing; iser has no restrictions on which pages it
can address.

Signed-off-by: Pete Wyckoff [EMAIL PROTECTED]
---
 drivers/infiniband/ulp/iser/iscsi_iser.c |8 
 1 files changed, 8 insertions(+), 0 deletions(-)

diff --git a/drivers/infiniband/ulp/iser/iscsi_iser.c 
b/drivers/infiniband/ulp/iser/iscsi_iser.c
index be1b9fb..1b272a6 100644
--- a/drivers/infiniband/ulp/iser/iscsi_iser.c
+++ b/drivers/infiniband/ulp/iser/iscsi_iser.c
@@ -543,6 +543,13 @@ iscsi_iser_ep_disconnect(__u64 ep_handle)
iser_conn_terminate(ib_conn);
 }
 
+static int iscsi_iser_slave_configure(struct scsi_device *sdev)
+{
+   blk_queue_bounce_limit(sdev-request_queue, BLK_BOUNCE_ANY);
   
   You really don't want to do this.  That signals to the block layer that
   we have an iommu, although it's practically the same thing as a 64 bit
   DMA mask ... but I'd just leave it to the DMA mask to set this up
   correctly.  Anything else is asking for a subtle bug to turn up years
   from now when something causes the mask and the limit to be mismatched.
  
  Oh.  I decided to add that line for symmetry with TCP, and was
  convinced by the arguments here:
  
  commit b6d44fe9582b9d90a0b16f508ac08a90d899bf56
  Author: Mike Christie [EMAIL PROTECTED]
  Date:   Thu Jul 26 12:46:47 2007 -0500
  
  [SCSI] iscsi_tcp: Turn off bounce buffers
  
  It was found by LSI that on setups with large amounts of memory
  we were bouncing buffers when we did not need to. If the iscsi tcp
  code touches the data buffer (or a helper does),
  it will kmap the buffer. iscsi_tcp also does not interact with hardware,
  so it does not have any hw dma restrictions. This patch sets the bounce
  buffer settings for our device queue so buffers should not be bounced
  because of a driver limit.
  
  I don't see a convenient place to callback into particular iscsi
  devices to set the DMA mask per-host.  It has to go on the
  shost_gendev, right?, but only for TCP and iSER, not qla4xxx, which
  handles its DMA mask during device probe.
 
 You should be taking your mask from the underlying infiniband device as
 part of the setup, shouldn't you?

I think you're right about this.  All the existing IB HW tries to
set a 64-bit dma mask, but that's no reason to disable the mechanism
entirely in iser.  I'll remove that line that disables bouncing in
my patch.  Perhaps Mike will know if the iscsi_tcp usage is still
appropriate.

-- Pete
-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] iscsi iser: remove DMA restrictions

2008-02-12 Thread Pete Wyckoff
[EMAIL PROTECTED] wrote on Tue, 12 Feb 2008 15:10 -0600:
 On Tue, 2008-02-12 at 15:54 -0500, Pete Wyckoff wrote:
  iscsi_iser does not have any hardware DMA restrictions.  Add a
  slave_configure function to remove any DMA alignment restriction,
  allowing the use of direct IO from arbitrary offsets within a page.
  Also disable page bouncing; iser has no restrictions on which pages it
  can address.
  
  Signed-off-by: Pete Wyckoff [EMAIL PROTECTED]
  ---
   drivers/infiniband/ulp/iser/iscsi_iser.c |8 
   1 files changed, 8 insertions(+), 0 deletions(-)
  
  diff --git a/drivers/infiniband/ulp/iser/iscsi_iser.c 
  b/drivers/infiniband/ulp/iser/iscsi_iser.c
  index be1b9fb..1b272a6 100644
  --- a/drivers/infiniband/ulp/iser/iscsi_iser.c
  +++ b/drivers/infiniband/ulp/iser/iscsi_iser.c
  @@ -543,6 +543,13 @@ iscsi_iser_ep_disconnect(__u64 ep_handle)
  iser_conn_terminate(ib_conn);
   }
   
  +static int iscsi_iser_slave_configure(struct scsi_device *sdev)
  +{
  +   blk_queue_bounce_limit(sdev-request_queue, BLK_BOUNCE_ANY);
 
 You really don't want to do this.  That signals to the block layer that
 we have an iommu, although it's practically the same thing as a 64 bit
 DMA mask ... but I'd just leave it to the DMA mask to set this up
 correctly.  Anything else is asking for a subtle bug to turn up years
 from now when something causes the mask and the limit to be mismatched.

Oh.  I decided to add that line for symmetry with TCP, and was
convinced by the arguments here:

commit b6d44fe9582b9d90a0b16f508ac08a90d899bf56
Author: Mike Christie [EMAIL PROTECTED]
Date:   Thu Jul 26 12:46:47 2007 -0500

[SCSI] iscsi_tcp: Turn off bounce buffers

It was found by LSI that on setups with large amounts of memory
we were bouncing buffers when we did not need to. If the iscsi tcp
code touches the data buffer (or a helper does),
it will kmap the buffer. iscsi_tcp also does not interact with hardware,
so it does not have any hw dma restrictions. This patch sets the bounce
buffer settings for our device queue so buffers should not be bounced
because of a driver limit.

I don't see a convenient place to callback into particular iscsi
devices to set the DMA mask per-host.  It has to go on the
shost_gendev, right?, but only for TCP and iSER, not qla4xxx, which
handles its DMA mask during device probe.

-- Pete
-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/3] iscsi iser: increase sg_tablesize

2008-02-12 Thread Pete Wyckoff
Increase FMR limits from 512 kB to 1 MB.  This matches the limit
used by SRP which also uses FMRs for memory mapping.  Also
correct a bug where the sg_tablesize was 1 smaller than what was
allocated, by folding the + 1 used everywhere into the
definition of the constant.

Signed-off-by: Pete Wyckoff [EMAIL PROTECTED]
---
 drivers/infiniband/ulp/iser/iscsi_iser.h |3 ++-
 drivers/infiniband/ulp/iser/iser_verbs.c |6 ++
 2 files changed, 4 insertions(+), 5 deletions(-)

diff --git a/drivers/infiniband/ulp/iser/iscsi_iser.h 
b/drivers/infiniband/ulp/iser/iscsi_iser.h
index 1ee867b..db8f81a 100644
--- a/drivers/infiniband/ulp/iser/iscsi_iser.h
+++ b/drivers/infiniband/ulp/iser/iscsi_iser.h
@@ -87,7 +87,8 @@
 #define MASK_4K(~(SIZE_4K-1))
 
/* support upto 512KB in one RDMA */
-#define ISCSI_ISER_SG_TABLESIZE (0x8  SHIFT_4K)
+/* FMR space for 1 MB of 4k-page transfers, plus 1 if not page aligned */
+#define ISCSI_ISER_SG_TABLESIZE (((120)  SHIFT_4K) + 1)
 #define ISCSI_ISER_MAX_LUN 256
 #define ISCSI_ISER_MAX_CMD_LEN 16
 
diff --git a/drivers/infiniband/ulp/iser/iser_verbs.c 
b/drivers/infiniband/ulp/iser/iser_verbs.c
index 714b8db..c5b374f 100644
--- a/drivers/infiniband/ulp/iser/iser_verbs.c
+++ b/drivers/infiniband/ulp/iser/iser_verbs.c
@@ -140,7 +140,7 @@ static int iser_create_ib_conn_res(struct iser_conn 
*ib_conn)
device = ib_conn-device;
 
ib_conn-page_vec = kmalloc(sizeof(struct iser_page_vec) +
-   (sizeof(u64) * (ISCSI_ISER_SG_TABLESIZE 
+1)),
+   sizeof(u64) * ISCSI_ISER_SG_TABLESIZE,
GFP_KERNEL);
if (!ib_conn-page_vec) {
ret = -ENOMEM;
@@ -149,9 +149,7 @@ static int iser_create_ib_conn_res(struct iser_conn 
*ib_conn)
ib_conn-page_vec-pages = (u64 *) (ib_conn-page_vec + 1);
 
params.page_shift= SHIFT_4K;
-   /* when the first/last SG element are not start/end *
-* page aligned, the map whould be of N+1 pages */
-   params.max_pages_per_fmr = ISCSI_ISER_SG_TABLESIZE + 1;
+   params.max_pages_per_fmr = ISCSI_ISER_SG_TABLESIZE;
/* make the pool size twice the max number of SCSI commands *
 * the ML is expected to queue, watermark for unmap at 50%  */
params.pool_size = ISCSI_DEF_XMIT_CMDS_MAX * 2;
-- 
1.5.3.8

-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/3] iscsi bidi varlen support

2008-02-12 Thread Pete Wyckoff
[EMAIL PROTECTED] wrote on Thu, 31 Jan 2008 20:08 +0200:
 Cheers after 1.3 years these can go in.
 
 [PATCH 1/3] iscsi: extended cdb support
The varlen support is not yet in mainline for
   block and scsi-ml. But the API for drivers will
   not change. All LLD need to do is max_command to
   the it's maximum and be ready for bigger commands.
   This is what's done here. Once these commands start
   coming iscsi will be ready for them.
 
 [PATCH 2/3] iscsi: bidi support - libiscsi
 [PATCH 3/3] iscsi: bidi support - iscsi_tcp
   bidirectional commands support in iscsi.
   iSER is not yet ready, but it will not break.
   There is already a mechanism in libiscsi that will
   return error if bidi commands are sent iSER way.
 
 Pete please send me the iSER bits so we can port them
 to this latest version.
 
 Mike these patches are ontop of iscs branch of the iscsi
 git tree, they will apply but for compilation you will need
 to sync with Linus mainline. The patches are for the in-tree
 iscsi code. I own you the compat patch for the out-off-tree
 code, but this I will only be Sunday.
 
 If we do it fast it might get accepted to 2.6.25 merge window
 
 Everybody is invited to a party at Shila ben-yhuda 52 Tel-Aviv
 9:45 pm. Drinks and wonderful see-food on us :)

Here's the patch to add bidi support to iSER too.  It works
with my setup, but could use more testing.  Note that this does
rely on the 3 patches quoted above.

-- Pete


From: Pete Wyckoff [EMAIL PROTECTED]

Support bidirectional SCSI commands in iSCSI iSER.  The handling of
data buffers for each direction was already mostly separated.  It was
necessary to separate the unmapping of data bufs when one was found
to be unaligned.  This was done by adding a direction parameter to
iser_dma_unmap_task_data() and calling it appropriately.  Also
iser_ctask_rdma_finalize() used local variables in such a way that
it assumed a unidirectional command; that was split up to make it
cleaner and order the operations correctly for bidi.

Signed-off-by: Pete Wyckoff [EMAIL PROTECTED]
---
 drivers/infiniband/ulp/iser/iscsi_iser.h |3 +-
 drivers/infiniband/ulp/iser/iser_initiator.c |   79 +++---
 drivers/infiniband/ulp/iser/iser_memory.c|9 ++-
 3 files changed, 41 insertions(+), 50 deletions(-)

diff --git a/drivers/infiniband/ulp/iser/iscsi_iser.h 
b/drivers/infiniband/ulp/iser/iscsi_iser.h
index 66905df..f5497b0 100644
--- a/drivers/infiniband/ulp/iser/iscsi_iser.h
+++ b/drivers/infiniband/ulp/iser/iscsi_iser.h
@@ -356,5 +356,6 @@ int iser_dma_map_task_data(struct iscsi_iser_cmd_task 
*iser_ctask,
enum   iser_data_dir   iser_dir,
enum   dma_data_direction  dma_dir);
 
-void iser_dma_unmap_task_data(struct iscsi_iser_cmd_task *iser_ctask);
+void iser_dma_unmap_task_data(struct iscsi_iser_cmd_task *iser_ctask,
+ enum iser_data_dir cmd_dir);
 #endif
diff --git a/drivers/infiniband/ulp/iser/iser_initiator.c 
b/drivers/infiniband/ulp/iser/iser_initiator.c
index ea3f5dc..491ffab 100644
--- a/drivers/infiniband/ulp/iser/iser_initiator.c
+++ b/drivers/infiniband/ulp/iser/iser_initiator.c
@@ -325,9 +325,8 @@ int iser_send_command(struct iscsi_conn *conn,
 {
struct iscsi_iser_conn *iser_conn = conn-dd_data;
struct iscsi_iser_cmd_task *iser_ctask = ctask-dd_data;
-   struct iser_dto *send_dto = NULL;
-   unsigned long edtl;
-   int err = 0;
+   struct iser_dto *send_dto;
+   int err;
struct iser_data_buf *data_buf;
 
struct iscsi_cmd *hdr =  ctask-hdr;
@@ -340,37 +339,32 @@ int iser_send_command(struct iscsi_conn *conn,
if (iser_check_xmit(conn, ctask))
return -ENOBUFS;
 
-   edtl = ntohl(hdr-data_length);
-
/* build the tx desc regd header and add it to the tx desc dto */
iser_ctask-desc.type = ISCSI_TX_SCSI_COMMAND;
send_dto = iser_ctask-desc.dto;
send_dto-ctask = iser_ctask;
iser_create_send_desc(iser_conn, iser_ctask-desc, ctask-hdr_len);
 
-   if (hdr-flags  ISCSI_FLAG_CMD_READ)
-   data_buf = iser_ctask-data[ISER_DIR_IN];
-   else
-   data_buf = iser_ctask-data[ISER_DIR_OUT];
-
-   if (scsi_sg_count(sc)) { /* using a scatter list */
-   data_buf-buf  = scsi_sglist(sc);
-   data_buf-size = scsi_sg_count(sc);
-   }
-
-   data_buf-data_len = scsi_bufflen(sc);
-
if (hdr-flags  ISCSI_FLAG_CMD_READ) {
-   err = iser_prepare_read_cmd(ctask, edtl);
+   data_buf = iser_ctask-data[ISER_DIR_IN];
+   data_buf-buf  = scsi_in(sc)-table.sgl;
+   data_buf-size = scsi_in(sc)-table.nents;
+   data_buf-data_len = scsi_in(sc)-length;
+   err = iser_prepare_read_cmd(ctask, data_buf-data_len);
if (err)
goto send_command_error

[PATCH] bsg: bidi bio map failure fix

2008-02-12 Thread Pete Wyckoff
If blk_rq_map_user requires more than one bio, and fails mapping
somewhere after the first bio, it will return with rq-bio set to
non-NULL, but it will have already unmapped the partial bio.  The
out: error exit section will see the non-null bio and try to unmap
it again, triggering a mapcount bug via bad_page().

Signed-off-by: Pete Wyckoff [EMAIL PROTECTED]
---
 block/bsg.c |4 +++-
 1 files changed, 3 insertions(+), 1 deletions(-)

diff --git a/block/bsg.c b/block/bsg.c
index 3337125..bba7154 100644
--- a/block/bsg.c
+++ b/block/bsg.c
@@ -295,8 +295,10 @@ bsg_map_hdr(struct bsg_device *bd, struct sg_io_v4 *hdr)
 
dxferp = (void*)(unsigned long)hdr-din_xferp;
ret =  blk_rq_map_user(q, next_rq, dxferp, hdr-din_xfer_len);
-   if (ret)
+   if (ret) {
+   next_rq-bio = NULL;  /* do not unmap twice */
goto out;
+   }
}
 
if (hdr-dout_xfer_len) {
-- 
1.5.3.8

-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/3] iscsi bidi varlen support

2008-02-12 Thread Pete Wyckoff
[EMAIL PROTECTED] wrote on Tue, 12 Feb 2008 15:12 -0500:
 [EMAIL PROTECTED] wrote on Thu, 31 Jan 2008 20:08 +0200:
  Cheers after 1.3 years these can go in.
  
  [PATCH 1/3] iscsi: extended cdb support
 The varlen support is not yet in mainline for
block and scsi-ml. But the API for drivers will
not change. All LLD need to do is max_command to
the it's maximum and be ready for bigger commands.
This is what's done here. Once these commands start
coming iscsi will be ready for them.
  
  [PATCH 2/3] iscsi: bidi support - libiscsi
  [PATCH 3/3] iscsi: bidi support - iscsi_tcp
bidirectional commands support in iscsi.
iSER is not yet ready, but it will not break.
There is already a mechanism in libiscsi that will
return error if bidi commands are sent iSER way.
  
  Pete please send me the iSER bits so we can port them
  to this latest version.
  
  Mike these patches are ontop of iscs branch of the iscsi
  git tree, they will apply but for compilation you will need
  to sync with Linus mainline. The patches are for the in-tree
  iscsi code. I own you the compat patch for the out-off-tree
  code, but this I will only be Sunday.
 
 Here's the patch to add bidi support to iSER too.  It works
 with my setup, but could use more testing.  Note that this does
 rely on the 3 patches quoted above.

Similar, for varlen support to iSER.  Probably apply this one before
the bidi one I just sent.

-- Pete


From: Pete Wyckoff [EMAIL PROTECTED]
Subject: [PATCH] iscsi iser: varlen

Handle variable-length CDBs in iSER.

Signed-off-by: Pete Wyckoff [EMAIL PROTECTED]
---
 drivers/infiniband/ulp/iser/iscsi_iser.c |5 +++--
 drivers/infiniband/ulp/iser/iscsi_iser.h |2 +-
 drivers/infiniband/ulp/iser/iser_initiator.c |   16 ++--
 3 files changed, 14 insertions(+), 9 deletions(-)

diff --git a/drivers/infiniband/ulp/iser/iscsi_iser.c 
b/drivers/infiniband/ulp/iser/iscsi_iser.c
index 5f2284d..9dfc310 100644
--- a/drivers/infiniband/ulp/iser/iscsi_iser.c
+++ b/drivers/infiniband/ulp/iser/iscsi_iser.c
@@ -401,7 +401,8 @@ iscsi_iser_session_create(struct iscsi_transport *iscsit,
ctask  = session-cmds[i];
iser_ctask = ctask-dd_data;
ctask-hdr = (struct iscsi_cmd *)iser_ctask-desc.iscsi_header;
-   ctask-hdr_max = sizeof(iser_ctask-desc.iscsi_header);
+   ctask-hdr_max = sizeof(iser_ctask-desc.iscsi_header) +
+sizeof(iser_ctask-desc.hdrextbuf);
}
 
for (i = 0; i  session-mgmtpool_max; i++) {
@@ -604,7 +605,7 @@ static struct iscsi_transport iscsi_iser_transport = {
.host_template  = iscsi_iser_sht,
.conndata_size  = sizeof(struct iscsi_conn),
.max_lun= ISCSI_ISER_MAX_LUN,
-   .max_cmd_len= ISCSI_ISER_MAX_CMD_LEN,
+   .max_cmd_len= 260,
/* session management */
.create_session = iscsi_iser_session_create,
.destroy_session= iscsi_session_teardown,
diff --git a/drivers/infiniband/ulp/iser/iscsi_iser.h 
b/drivers/infiniband/ulp/iser/iscsi_iser.h
index db8f81a..66905df 100644
--- a/drivers/infiniband/ulp/iser/iscsi_iser.h
+++ b/drivers/infiniband/ulp/iser/iscsi_iser.h
@@ -90,7 +90,6 @@
 /* FMR space for 1 MB of 4k-page transfers, plus 1 if not page aligned */
 #define ISCSI_ISER_SG_TABLESIZE (((120)  SHIFT_4K) + 1)
 #define ISCSI_ISER_MAX_LUN 256
-#define ISCSI_ISER_MAX_CMD_LEN 16
 
 /* QP settings */
 /* Maximal bounds on received asynchronous PDUs */
@@ -217,6 +216,7 @@ enum iser_desc_type {
 struct iser_desc {
struct iser_hdr  iser_header;
struct iscsi_hdr iscsi_header;
+   char hdrextbuf[ISCSI_MAX_AHS_SIZE];
struct iser_regd_buf hdr_regd_buf;
void *data; /* used by RX  TX_CONTROL 
*/
struct iser_regd_buf data_regd_buf; /* used by RX  TX_CONTROL 
*/
diff --git a/drivers/infiniband/ulp/iser/iser_initiator.c 
b/drivers/infiniband/ulp/iser/iser_initiator.c
index 83247f1..ea3f5dc 100644
--- a/drivers/infiniband/ulp/iser/iser_initiator.c
+++ b/drivers/infiniband/ulp/iser/iser_initiator.c
@@ -246,9 +246,13 @@ post_rx_kmalloc_failure:
return err;
 }
 
-/* creates a new tx descriptor and adds header regd buffer */
+/**
+ * iser_create_send_desc - creates a new tx descriptor and adds header regd 
buffer
+ * @iscsi_hdr_len: length of struct iscsi_hdr and any AHSes in hdrextbuf
+ */
 static void iser_create_send_desc(struct iscsi_iser_conn *iser_conn,
- struct iser_desc   *tx_desc)
+ struct iser_desc   *tx_desc,
+ int iscsi_hdr_len)
 {
struct iser_regd_buf *regd_hdr = tx_desc-hdr_regd_buf;
struct iser_dto  *send_dto = tx_desc

[PATCH 1/3] iscsi iser: remove DMA restrictions

2008-02-12 Thread Pete Wyckoff
iscsi_iser does not have any hardware DMA restrictions.  Add a
slave_configure function to remove any DMA alignment restriction,
allowing the use of direct IO from arbitrary offsets within a page.
Also disable page bouncing; iser has no restrictions on which pages it
can address.

Signed-off-by: Pete Wyckoff [EMAIL PROTECTED]
---
 drivers/infiniband/ulp/iser/iscsi_iser.c |8 
 1 files changed, 8 insertions(+), 0 deletions(-)

diff --git a/drivers/infiniband/ulp/iser/iscsi_iser.c 
b/drivers/infiniband/ulp/iser/iscsi_iser.c
index be1b9fb..1b272a6 100644
--- a/drivers/infiniband/ulp/iser/iscsi_iser.c
+++ b/drivers/infiniband/ulp/iser/iscsi_iser.c
@@ -543,6 +543,13 @@ iscsi_iser_ep_disconnect(__u64 ep_handle)
iser_conn_terminate(ib_conn);
 }
 
+static int iscsi_iser_slave_configure(struct scsi_device *sdev)
+{
+   blk_queue_bounce_limit(sdev-request_queue, BLK_BOUNCE_ANY);
+   blk_queue_dma_alignment(sdev-request_queue, 0);
+   return 0;
+}
+
 static struct scsi_host_template iscsi_iser_sht = {
.module = THIS_MODULE,
.name   = iSCSI Initiator over iSER, v. DRV_VER,
@@ -556,6 +563,7 @@ static struct scsi_host_template iscsi_iser_sht = {
.eh_device_reset_handler= iscsi_eh_device_reset,
.eh_host_reset_handler  = iscsi_eh_host_reset,
.use_clustering = DISABLE_CLUSTERING,
+   .slave_configure= iscsi_iser_slave_configure,
.proc_name  = iscsi_iser,
.this_id= -1,
 };
-- 
1.5.3.8

-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/3] iscsi: bidi support - libiscsi

2008-02-11 Thread Pete Wyckoff
[EMAIL PROTECTED] wrote on Thu, 31 Jan 2008 22:29 +0200:
   iscsi bidi support at the generic libiscsi level
   - prepare the additional bidi_read rlength header.
   - access the right scsi_in() and/or scsi_out() side of things.
 also for resid.
   - Handle BIDI underflow overflow from target
 
 Signed-off-by: Boaz Harrosh [EMAIL PROTECTED]

I see you do this a bit differently than in your previous patch set.
In particular, the residual handling in libiscsi.c.  (I'm editing in
a bit more context to the patch below.)

 + if (scsi_bidi_cmnd(sc) 
 + (rhdr-flags  (ISCSI_FLAG_CMD_BIDI_UNDERFLOW |
 + ISCSI_FLAG_CMD_BIDI_OVERFLOW))) {
 + int res_count = be32_to_cpu(rhdr-bi_residual_count);
 +
 + if (res_count  0 
 + (rhdr-flags  ISCSI_FLAG_CMD_BIDI_OVERFLOW ||
 + res_count = scsi_in(sc)-length))
 + scsi_in(sc)-resid = res_count;
 + else
 + sc-result =
 + (DID_BAD_TARGET  16) | rhdr-cmd_status;
 + }
 +
   if (rhdr-flags  (ISCSI_FLAG_CMD_UNDERFLOW |
  ISCSI_FLAG_CMD_OVERFLOW)) {
   int res_count = be32_to_cpu(rhdr-residual_count);

   if (res_count  0 
   (rhdr-flags  ISCSI_FLAG_CMD_OVERFLOW ||
res_count = scsi_bufflen(sc)))
 + /* write side for bidi or uni-io set_resid */
   scsi_set_resid(sc, res_count);
   else
   sc-result = (DID_BAD_TARGET  16) | rhdr-cmd_status;
   } else if (rhdr-flags  (ISCSI_FLAG_CMD_BIDI_UNDERFLOW |
 ISCSI_FLAG_CMD_BIDI_OVERFLOW))
   sc-result = (DID_BAD_TARGET  16) | rhdr-cmd_status;

I haven't tested this, but, consider a bidi command that results in
an overflow on the read part, and no overflow on the write part.
E.g., the user supplied a data-in buffer that wasn't big enough to
hold the returned data from the target, but the data-out buffer was
just right.

Then this code will set scsi_in(sc)-resid properly, informing the
caller that there were extra bytes that were not transferred.  But
the else clause at the bottom will also set sc-result to be bad.
I don't think we want this.

Your earlier patch got rid of the second bidi_overflow handler and
just did all the logic for both bidi and non-bidi cases in a single
if clause.  Seemed better.

-- Pete
-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/3] iscsi: bidi support - libiscsi

2008-02-11 Thread Pete Wyckoff
[EMAIL PROTECTED] wrote on Mon, 11 Feb 2008 18:05 +0200:
 You are most probably right I will investigate what happened. It looks
 like I went back to some old version right? or a merge fallout
 Thanks for reviewing.
 
 Please also test latest head-of-line code if possible + iscsi patches
 + last varlen I sent.
 
 Is there any new patches I need for 2.6.24 or head-of-line for my 
 osd-dev tree?

Testing now.  My patch set is actually shrinking (!) thanks to some
merges.  Some rebasing was required to apply your three varlen
patches and three bidi patches, but I'm sure you'll update your tree
and push those upstream soon enough.  I'll take a look at iser bidi
then update my patch list and let you know soon.

-- Pete
-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Integration of SCST in the mainstream Linux kernel

2008-02-02 Thread Pete Wyckoff
[EMAIL PROTECTED] wrote on Wed, 30 Jan 2008 10:34 -0600:
 On Wed, 2008-01-30 at 09:38 +0100, Bart Van Assche wrote:
  On Jan 30, 2008 12:32 AM, FUJITA Tomonori [EMAIL PROTECTED] wrote:
  
   iSER has parameters to limit the maximum size of RDMA (it needs to
   repeat RDMA with a poor configuration)?
  
  Please specify which parameters you are referring to. As you know I
  had already repeated my tests with ridiculously high values for the
  following iSER parameters: FirstBurstLength, MaxBurstLength and
  MaxRecvDataSegmentLength (16 MB, which is more than the 1 MB block
  size specified to dd).
 
 the 1Mb block size is a bit of a red herring.  Unless you've
 specifically increased the max_sector_size and are using an sg_chain
 converted driver, on x86 the maximum possible transfer accumulation is
 0.5MB.
 
 I certainly don't rule out that increasing the transfer size up from
 0.5MB might be the way to improve STGT efficiency, since at an 1GB/s
 theoretical peak, that's roughly 2000 context switches per I/O; however,
 It doesn't look like you've done anything that will overcome the block
 layer limitations.

The MRDSL parameter has no effect on iSER, as the RFC describes.
How to transfer data to satisfy a command is solely up to the
target.  So you would need both big requests from the client, then
look at how the target will send the data.

I've only used 512 kB for the RDMA transfer size from the target, as
it matches the default client size and was enough to get good
performance out of my IB gear and minimizes resource consumption on
the target.  It's currently hard-coded as a #define.  There is no
provision in the protocol for the client to dictate the value.

If others want to spend some effort trying to tune stgt for iSER,
there are a fair number of comments in the code, including a big one
that explains this RDMA transfer size issue.  And I'll answer
informed questions as I can.  But I'm not particularly interested in
arguing about which implementation is best, or trying to interpret
bandwidth comparison numbers from poorly designed tests.  It takes
work to understand these issues.

-- Pete
-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] use the cmd_type of a leading request for scsi_init_sgtable

2008-01-25 Thread Pete Wyckoff
[EMAIL PROTECTED] wrote on Fri, 25 Jan 2008 20:05 -0600:
 On Sat, 2008-01-26 at 09:57 +0900, FUJITA Tomonori wrote:
  This is against the scsi-bidi tree.
  
  We need to use the cmd_type of a leading request for scsi_init_sgtable
  to set up scsi_data_buffer:length of a bidi request properly.
  
  An alternative approach is setting the cmd_type of a leading request
  and its bidi request (*1). But the block layer and scsi-ml don't
  expect that the leading request and its sub-requests have the
  different command types.
  
  Note that scsi_debug's XDWRITEREAD_10 support is fine without this
  patch since req-nr_sectors works for it but req-nr_sectors doesn't
  work for everyone.
  
  (*1)
  
  http://www.mail-archive.com/linux-scsi@vger.kernel.org/msg12669.html
  
  =
  From: FUJITA Tomonori [EMAIL PROTECTED]
  Subject: [PATCH] use the cmd_type of a leading request for scsi_init_sgtable
  
  We need to use the cmd_type of a leading request for scsi_init_sgtable
  to set up scsi_data_buffer:length of its bidi request properly.
 
 This seems to be a very convoluted work around for the fact that we
 forgot to set the cmd_type on the subordinate request.
 
 Wouldn't this be a better fix?
 
 James
 
 ---
 
 diff --git a/block/bsg.c b/block/bsg.c
 index 69b0a9d..8917c51 100644
 --- a/block/bsg.c
 +++ b/block/bsg.c
 @@ -279,6 +279,7 @@ bsg_map_hdr(struct bsg_device *bd, struct sg_io_v4 *hdr)
   goto out;
   }
   rq-next_rq = next_rq;
 + next_rq-cmd_type = rq-cmd_type;
  
   dxferp = (void*)(unsigned long)hdr-din_xferp;
   ret =  blk_rq_map_user(q, next_rq, dxferp, hdr-din_xfer_len);
 
 

Agree.  I did essentially the same thing in:

http://marc.info/?l=linux-scsim=11983623270

Tomo, you may want to have a look at all the bsg patches I sent back
in Dec 2007.  Boaz has the minimum required for bidi in his tree.
There are a few more here too, if you want to see what we need for OSD:

http://git.osc.edu/?p=linux.git;a=summary

-- Pete
-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Performance of SCST versus STGT

2008-01-17 Thread Pete Wyckoff
[EMAIL PROTECTED] wrote on Thu, 17 Jan 2008 19:05 +0900:
 On Thu, 17 Jan 2008 12:48:28 +0300
 Vladislav Bolkhovitin [EMAIL PROTECTED] wrote:
 
  FUJITA Tomonori wrote:
   On Thu, 17 Jan 2008 10:27:08 +0100
   Bart Van Assche [EMAIL PROTECTED] wrote:
   
   
  Hello,
  
  I have performed a test to compare the performance of SCST and STGT.
  Apparently the SCST target implementation performed far better than
  the STGT target implementation. This makes me wonder whether this is
  due to the design of SCST or whether STGT's performance can be
  improved to the level of SCST ?
  
  Test performed: read 2 GB of data in blocks of 1 MB from a target (hot
  cache -- no disk reads were performed, all reads were from the cache).
  Test command: time dd if=/dev/sde of=/dev/null bs=1M count=2000
  
STGT read SCST read
 performance (MB/s)   performance (MB/s)
  Ethernet (1 Gb/s network)7789
  IPoIB (8 Gb/s network)   82   229
  SRP (8 Gb/s network)N/A   600
  iSER (8 Gb/s network)80   N/A
  
  These results show that SCST uses the InfiniBand network very well
  (effectivity of about 88% via SRP), but that the current STGT version
  is unable to transfer data faster than 82 MB/s. Does this mean that
  there is a severe bottleneck  present in the current STGT
  implementation ?
   
   
   I don't know about the details but Pete said that he can achieve more
   than 900MB/s read performance with tgt iSER target using ramdisk.
   
   http://www.mail-archive.com/[EMAIL PROTECTED]/msg4.html
  
  Please don't confuse multithreaded latency insensitive workload with 
  single threaded, hence latency sensitive one.
 
 Seems that he can get good performance with single threaded workload:
 
 http://www.osc.edu/~pw/papers/wyckoff-iser-snapi07-talk.pdf
 
 But I don't know about the details so let's wait for Pete to comment
 on this.

Page 16 is pretty straight forward.  One command outstanding from
the client.  It is an OSD read command.  Data on tmpfs.  500 MB/s is
pretty easy to get on IB.

The other graph on page 23 is for block commands.  600 MB/s ish.
Still single command; so essentially a latency test.  Dominated by
the memcpy time from tmpfs to pinned IB buffer, as per page 24.

Erez said:

 We didn't run any real performance test with tgt, so I don't have
 numbers yet. I know that Pete got ~900 MB/sec by hacking sgp_dd, so all
 data was read/written to the same block (so it was all done in the
 cache). Pete - am I right?

Yes (actually just 1 thread in sg_dd).  This is obviously cheating.
Take the pread time to zero in SCSI Read analysis on page 24 to show
max theoretical.  It's IB theoretical minus some initiator and stgt
overheads.

The other way to get more read throughput is to throw multiple
simultaneous commands at the server.

There's nothing particularly stunning here.  Suspect Bart has
configuration issues if not even IPoIB will do  100 MB/s.

-- Pete

-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] bsg : Add support for io vectors in bsg

2008-01-14 Thread Pete Wyckoff
[EMAIL PROTECTED] wrote on Fri, 11 Jan 2008 19:16 -0500:
 James Bottomley wrote:
 On Thu, 2008-01-10 at 16:46 -0500, Pete Wyckoff wrote:
 [EMAIL PROTECTED] wrote on Thu, 10 Jan 2008 14:55 -0600:
 On Thu, 2008-01-10 at 15:43 -0500, Pete Wyckoff wrote:
 [EMAIL PROTECTED] wrote on Wed, 09 Jan 2008 09:11 +0900:
 On Tue, 8 Jan 2008 17:09:18 -0500
 Pete Wyckoff [EMAIL PROTECTED] wrote:
 I took another look at the compat approach, to see if it is feasible
 to keep the compat handling somewhere else, without the use of #ifdef
 CONFIG_COMPAT and size-comparison code inside bsg.c.  I don't see how.
 The use of iovec is within a write operation on a char device.  It's
 not amenable to a compat_sys_ or a .compat_ioctl approach.

 I'm partial to #1 because the use of architecture-independent fields
 matches the rest of struct sg_io_v4.  But if you don't want to have
 another iovec type in the kernel, could we do #2 but just return
 -EINVAL if the need for compat is detected?  I.e. change
 dout_iovec_count to dout_iovec_length and do the math?
 If you are ok with removing the write/read interface and just have
 ioctl, we could can handle comapt stuff like others do. But I think
 that you (OSD people) really want to keep the write/read
 interface. Sorry, I think that there is no workaround to support iovec
 in bsg.
 I don't care about read/write in particular.  But we do need some
 way to launch asynchronous SCSI commands, and currently read/write
 are the only way to do that in bsg.  The reason is to keep multiple
 spindles busy at the same time.
 Won't multi-threading the ioctl calls achieve the same effect?  Or do
 you trip over BKL there?
 There's no BKL on (new) ioctls anymore, at least.  A thread per
 device would be feasible perhaps.  But if you want any sort of
 pipelining out of the device, esp. in the remote iSCSI case, you
 need to have a good number of commands outstanding to each device.
 So a thread per command per device.  Typical iSCSI queue depth of
 128 times 16 devices for a small setup is a lot of threads.

 I was actually thinking of a thread per outstanding command.

 The pthread/pipe latency overhead is not insignificant for fast
 storage networks too.

 How about these new ioctls instead of read/write:

 SG_IO_SUBMIT - start a new blk_execute_rq_nowait()
 SG_IO_TEST   - complete and return a previous req
 SG_IO_WAIT   - wait for a req to finish, interruptibly

 Then old write users will instead do ioctl SUBMIT.  Read users will
 do TEST for non-blocking fd, or WAIT for blocking.  And SG_IO could
 be implemented as SUBMIT + WAIT.

 Then we can do compat_ioctl and convert up iovecs out-of-line before
 calling the normal functions.

 Let me know if you want a patch for this.
 Really, the thought of re-inventing yet another async I/O interface
 isn't very appealing.
 I'm fine with read/write, except Tomo is against handling iovecs
 because of the compat complexity with struct iovec being different
 on 32- vs 64-bit.  There is a standard way to do compat ioctl that
 hides this handling in a different file (not bsg.c), which is the
 only reason I'm even considering these ioctls.  I don't care about
 compat setups per se.

 Is there another async I/O mechanism?  Userspace builds the CDBs,
 just needs some way to drop them in SCSI ML.  BSG is almost perfect
 for this, but doesn't do iovec, leading to lots of memcpy.

 No, it's just that async interfaces in Linux have a long and fairly
 unhappy history.

 The sg driver's async interface has been pretty stable for
 a long time. The sync SG_IO ioctl is built on top of the
 async interface. That makes the async interface extremely
 well tested.

 The write()/read() async interface in sg does have one
 problem: when a command is dispatched via a write()
 it would be very useful to get back a tag but that
 violates write()'s second argument: 'const void * buf'.
 That tag could be useful both for identification of the
 response and by task management functions.

 I was hoping that the 'flags' field in sgv4 could be used
 to implement the variants:
 SG_IO_SUBMIT - start a new blk_execute_rq_nowait()
 SG_IO_TEST   - complete and return a previous req
 SG_IO_WAIT   - wait for a req to finish, interruptibly

 that way the existing SG_IO ioctl is sufficient.

 And if Tomo doesn't want to do it in the bsg driver,
 then it could be done it the sg driver.

The sg driver already has async via read/write, and it works fine.
Perhaps someone wants the ioctl versions too, but that's not my main
goal here.

I think that sg doesn't bother with compat iovec handling.  It just
uses SZ_SG_IOVEC (defined as sizeof(sg_iovec_t)) and doesn't check
if the userspace iovec happens to be smaller.  Sg does have a compat
ioctl function; it just doesn't support SG_IO.  So, on 64-bit
kernel, read/write from 32-bit userspace with iovec will get
undefined results due to the mis-interpretation of the iovec fields,
while ioctl from 32-bit will fail with ENOIOCTLCMD

Re: [PATCH] bsg : Add support for io vectors in bsg

2008-01-10 Thread Pete Wyckoff
[EMAIL PROTECTED] wrote on Wed, 09 Jan 2008 09:11 +0900:
 On Tue, 8 Jan 2008 17:09:18 -0500
 Pete Wyckoff [EMAIL PROTECTED] wrote:
  I took another look at the compat approach, to see if it is feasible
  to keep the compat handling somewhere else, without the use of #ifdef
  CONFIG_COMPAT and size-comparison code inside bsg.c.  I don't see how.
  The use of iovec is within a write operation on a char device.  It's
  not amenable to a compat_sys_ or a .compat_ioctl approach.
  
  I'm partial to #1 because the use of architecture-independent fields
  matches the rest of struct sg_io_v4.  But if you don't want to have
  another iovec type in the kernel, could we do #2 but just return
  -EINVAL if the need for compat is detected?  I.e. change
  dout_iovec_count to dout_iovec_length and do the math?
 
 If you are ok with removing the write/read interface and just have
 ioctl, we could can handle comapt stuff like others do. But I think
 that you (OSD people) really want to keep the write/read
 interface. Sorry, I think that there is no workaround to support iovec
 in bsg.

I don't care about read/write in particular.  But we do need some
way to launch asynchronous SCSI commands, and currently read/write
are the only way to do that in bsg.  The reason is to keep multiple
spindles busy at the same time.

How about these new ioctls instead of read/write:

SG_IO_SUBMIT - start a new blk_execute_rq_nowait()
SG_IO_TEST   - complete and return a previous req
SG_IO_WAIT   - wait for a req to finish, interruptibly

Then old write users will instead do ioctl SUBMIT.  Read users will
do TEST for non-blocking fd, or WAIT for blocking.  And SG_IO could
be implemented as SUBMIT + WAIT.

Then we can do compat_ioctl and convert up iovecs out-of-line before
calling the normal functions.

Let me know if you want a patch for this.

-- Pete
-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] bsg : Add support for io vectors in bsg

2008-01-10 Thread Pete Wyckoff
[EMAIL PROTECTED] wrote on Thu, 10 Jan 2008 14:55 -0600:
 On Thu, 2008-01-10 at 15:43 -0500, Pete Wyckoff wrote:
  [EMAIL PROTECTED] wrote on Wed, 09 Jan 2008 09:11 +0900:
   On Tue, 8 Jan 2008 17:09:18 -0500
   Pete Wyckoff [EMAIL PROTECTED] wrote:
I took another look at the compat approach, to see if it is feasible
to keep the compat handling somewhere else, without the use of #ifdef
CONFIG_COMPAT and size-comparison code inside bsg.c.  I don't see how.
The use of iovec is within a write operation on a char device.  It's
not amenable to a compat_sys_ or a .compat_ioctl approach.

I'm partial to #1 because the use of architecture-independent fields
matches the rest of struct sg_io_v4.  But if you don't want to have
another iovec type in the kernel, could we do #2 but just return
-EINVAL if the need for compat is detected?  I.e. change
dout_iovec_count to dout_iovec_length and do the math?
   
   If you are ok with removing the write/read interface and just have
   ioctl, we could can handle comapt stuff like others do. But I think
   that you (OSD people) really want to keep the write/read
   interface. Sorry, I think that there is no workaround to support iovec
   in bsg.
  
  I don't care about read/write in particular.  But we do need some
  way to launch asynchronous SCSI commands, and currently read/write
  are the only way to do that in bsg.  The reason is to keep multiple
  spindles busy at the same time.
 
 Won't multi-threading the ioctl calls achieve the same effect?  Or do
 you trip over BKL there?

There's no BKL on (new) ioctls anymore, at least.  A thread per
device would be feasible perhaps.  But if you want any sort of
pipelining out of the device, esp. in the remote iSCSI case, you
need to have a good number of commands outstanding to each device.
So a thread per command per device.  Typical iSCSI queue depth of
128 times 16 devices for a small setup is a lot of threads.

The pthread/pipe latency overhead is not insignificant for fast
storage networks too.

  How about these new ioctls instead of read/write:
  
  SG_IO_SUBMIT - start a new blk_execute_rq_nowait()
  SG_IO_TEST   - complete and return a previous req
  SG_IO_WAIT   - wait for a req to finish, interruptibly
  
  Then old write users will instead do ioctl SUBMIT.  Read users will
  do TEST for non-blocking fd, or WAIT for blocking.  And SG_IO could
  be implemented as SUBMIT + WAIT.
  
  Then we can do compat_ioctl and convert up iovecs out-of-line before
  calling the normal functions.
  
  Let me know if you want a patch for this.
 
 Really, the thought of re-inventing yet another async I/O interface
 isn't very appealing.

I'm fine with read/write, except Tomo is against handling iovecs
because of the compat complexity with struct iovec being different
on 32- vs 64-bit.  There is a standard way to do compat ioctl that
hides this handling in a different file (not bsg.c), which is the
only reason I'm even considering these ioctls.  I don't care about
compat setups per se.

Is there another async I/O mechanism?  Userspace builds the CDBs,
just needs some way to drop them in SCSI ML.  BSG is almost perfect
for this, but doesn't do iovec, leading to lots of memcpy.

-- Pete
-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] bsg : Add support for io vectors in bsg

2008-01-08 Thread Pete Wyckoff
[EMAIL PROTECTED] wrote on Sat, 05 Jan 2008 14:01 +0900:
 From: Deepak Colluru [EMAIL PROTECTED]
 Subject: [PATCH] bsg : Add support for io vectors in bsg
 Date: Fri, 4 Jan 2008 21:47:34 +0530 (IST)
 
  From: Deepak Colluru [EMAIL PROTECTED]
  
  Add support for io vectors in bsg.
  
  Signed-off-by: Deepak Colluru [EMAIL PROTECTED]
  ---
bsg.c |   52 +---
1 file changed, 49 insertions(+), 3 deletions(-)
 
 Thanks, but I have to NACK this.
 
 You can find the discussion about bsg io vector support and a similar
 patch in linux-scsi archive. I have no plan to support it since it
 needs the compat hack.

You may recall this is one of the patches I need to use bsg with OSD
devices.  OSDs overload the SCSI buffer model to put mulitple fields
in dataout and datain.  Some is user data, but some is more
logically created by a library.  Memcpying in userspace to wedge all
the segments into a single buffer is painful, and is required both
on outgoing and incoming data buffers.

There are two approaches to add iovec to bsg.

1.  Define a new sg_iovec_v4 that uses constant width types.  Both
32- and 64-bit userspace would hand arrays of this to the kernel.

struct sg_v4_iovec {
__u64 iov_base;
__u32 iov_len;
__u32 __pad1;
};

Old patch here:  http://article.gmane.org/gmane.linux.scsi/30461/


2.  Do as Deepak has done, using the existing sg_iovec, but then
also work around the compat issue.  Old v3 sg_iovec is:

typedef struct sg_iovec /* same structure as used by readv() Linux system */
{   /* call. It defines one scatter-gather element. */
void __user *iov_base;  /* Starting address  */
size_t iov_len; /* Length in bytes  */
} sg_iovec_t;

Old patch here:  http://article.gmane.org/gmane.linux.scsi/30460/

I took another look at the compat approach, to see if it is feasible
to keep the compat handling somewhere else, without the use of #ifdef
CONFIG_COMPAT and size-comparison code inside bsg.c.  I don't see how.
The use of iovec is within a write operation on a char device.  It's
not amenable to a compat_sys_ or a .compat_ioctl approach.

I'm partial to #1 because the use of architecture-independent fields
matches the rest of struct sg_io_v4.  But if you don't want to have
another iovec type in the kernel, could we do #2 but just return
-EINVAL if the need for compat is detected?  I.e. change
dout_iovec_count to dout_iovec_length and do the math?

-- Pete
-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] relax scsi dma alignment

2008-01-01 Thread Pete Wyckoff
[EMAIL PROTECTED] wrote on Mon, 31 Dec 2007 16:43 -0600:
 This patch relaxes the default SCSI DMA alignment from 512 bytes to 4
 bytes.  I remember from previous discussions that usb and firewire have
 sector size alignment requirements, so I upped their alignments in the
 respective slave allocs.
 
 The reason for doing this is so that we don't get such a huge amount of
 copy overhead in bio_copy_user() for udev.  (basically all inquiries it
 issues can now be directly mapped).

Great change.  Here's another patch.  It allows a queue
dma_alignment of 0 bytes, for drivers that can move data
at byte granularity.

Two drivers try to turn off DMA alignment restrictions by
setting the dma_alignment to zero:

drivers/scsi/iscsi_tcp.c: blk_queue_dma_alignment(sdev-request_queue, 0);
drivers/scsi/scsi_tgt_lib.c: blk_queue_dma_alignment(q, 0);

But they end up doing copies due to the way that queue_dma_alignment()
returns 511 in this case.

-- Pete

---

From 90ee6d61ad71a024815eee2b416edb40c6b01256 Mon Sep 17 00:00:00 2001
From: Pete Wyckoff [EMAIL PROTECTED]
Date: Tue, 1 Jan 2008 10:23:02 -0500
Subject: [PATCH] block: allow queue dma_alignment of zero

Let queue_dma_alignment return 0 if it was specifically set to 0.
This permits devices with no particular alignment restrictions to
use arbitrary user space buffers without copying.

Signed-off-by: Pete Wyckoff [EMAIL PROTECTED]
---
 include/linux/blkdev.h |7 +--
 1 files changed, 1 insertions(+), 6 deletions(-)

diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index aa3090c..eea31c2 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -860,12 +860,7 @@ static inline int bdev_hardsect_size(struct block_device 
*bdev)
 
 static inline int queue_dma_alignment(struct request_queue *q)
 {
-   int retval = 511;
-
-   if (q  q-dma_alignment)
-   retval = q-dma_alignment;
-
-   return retval;
+   return q ? q-dma_alignment : 511;
 }
 
 /* assumes size  256 */
-- 
1.5.3.6

-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] relax scsi dma alignment

2008-01-01 Thread Pete Wyckoff
[EMAIL PROTECTED] wrote on Tue, 01 Jan 2008 10:27 -0600:
 On Tue, 2008-01-01 at 11:08 -0500, Pete Wyckoff wrote:
  [EMAIL PROTECTED] wrote on Mon, 31 Dec 2007 16:43 -0600:
   This patch relaxes the default SCSI DMA alignment from 512 bytes to 4
   bytes.  I remember from previous discussions that usb and firewire have
   sector size alignment requirements, so I upped their alignments in the
   respective slave allocs.
   
   The reason for doing this is so that we don't get such a huge amount of
   copy overhead in bio_copy_user() for udev.  (basically all inquiries it
   issues can now be directly mapped).
  
  Great change.  Here's another patch.  It allows a queue
  dma_alignment of 0 bytes, for drivers that can move data
  at byte granularity.
  
  Two drivers try to turn off DMA alignment restrictions by
  setting the dma_alignment to zero:
  
  drivers/scsi/iscsi_tcp.c: blk_queue_dma_alignment(sdev-request_queue, 
  0);
  drivers/scsi/scsi_tgt_lib.c: blk_queue_dma_alignment(q, 0);
  
  But they end up doing copies due to the way that queue_dma_alignment()
  returns 511 in this case.
[..]
  From 90ee6d61ad71a024815eee2b416edb40c6b01256 Mon Sep 17 00:00:00 2001
  From: Pete Wyckoff [EMAIL PROTECTED]
  Date: Tue, 1 Jan 2008 10:23:02 -0500
  Subject: [PATCH] block: allow queue dma_alignment of zero
  
  Let queue_dma_alignment return 0 if it was specifically set to 0.
  This permits devices with no particular alignment restrictions to
  use arbitrary user space buffers without copying.
  
  Signed-off-by: Pete Wyckoff [EMAIL PROTECTED]
  ---
   include/linux/blkdev.h |7 +--
   1 files changed, 1 insertions(+), 6 deletions(-)
  
  diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
  index aa3090c..eea31c2 100644
  --- a/include/linux/blkdev.h
  +++ b/include/linux/blkdev.h
  @@ -860,12 +860,7 @@ static inline int bdev_hardsect_size(struct 
  block_device *bdev)
   
   static inline int queue_dma_alignment(struct request_queue *q)
   {
  -   int retval = 511;
  -
  -   if (q  q-dma_alignment)
  -   retval = q-dma_alignment;
  -
  -   return retval;
  +   return q ? q-dma_alignment : 511;
   }
   
   /* assumes size  256 */
 
 This is certainly a possible patch.  What assurance do you have that it
 doesn't upset block devices who set zero assuming they get the default
 alignment?

Code inspection of the initialization and use of this field.  I hope
anybody who sees a mistake here will point it out.

1. Initialization

Most users call blk_init_queue{,_node} to alloc and initialize a new
request_queue.  This leads to initialization of dma_alignment to 511
in blk_queue_make_request().

The rest of the callers use blk_alloc_queue{,_node}.  Most of those
call blk_queue_make_request() with a custom make_request_fn a few
lines later, similarly causing dma_alignment to be initialized to
non-zero.  The loop and pktcdvd drivers require a bit of reading to
convince oneself, but also can be seen to call
blk_queue_make_request() before using the queue.

2. Use of blk_queue_dma_alignment() to set, queue_dma_alignment() and
   -dma_alignment to get

Inspection of the 20-odd spots that match dma_alignment shows that
none of them set zero to expect the default.


Definitely a valid concern, but it seems to be a safe change, at
least for in-tree users.  Do you think a new request_queue flag for
zero-aware drivers and a WARN_ON that checks for zero and !zero_aware
would be worthwhile?

Without this change, we can go as low as two-byte alignment on
buffer start and length by setting dma_alignment to 1, but will do a
full copy if (buffer1) or (length1).

-- Pete

-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/5] iscsi: extended cdb support

2007-12-22 Thread Pete Wyckoff
From: Boaz Harrosh [EMAIL PROTECTED]

  Once varlen cdbs are supported by the block and scsi-ml layers
  we can apply this patch to support extended CDBs in iscsi.

Signed-off-by: Boaz Harrosh [EMAIL PROTECTED]
Signed-off-by: Pete Wyckoff [EMAIL PROTECTED]
---
 drivers/scsi/iscsi_tcp.c   |2 +-
 drivers/scsi/libiscsi.c|   56 
 include/scsi/iscsi_proto.h |6 +++-
 3 files changed, 56 insertions(+), 8 deletions(-)

diff --git a/drivers/scsi/iscsi_tcp.c b/drivers/scsi/iscsi_tcp.c
index e5be5fd..9fde5ce 100644
--- a/drivers/scsi/iscsi_tcp.c
+++ b/drivers/scsi/iscsi_tcp.c
@@ -1973,7 +1973,7 @@ static struct iscsi_transport iscsi_tcp_transport = {
.host_template  = iscsi_sht,
.conndata_size  = sizeof(struct iscsi_conn),
.max_conn   = 1,
-   .max_cmd_len= 16,
+   .max_cmd_len= SCSI_MAX_VARLEN_CDB_SIZE,
/* session management */
.create_session = iscsi_tcp_session_create,
.destroy_session= iscsi_tcp_session_destroy,
diff --git a/drivers/scsi/libiscsi.c b/drivers/scsi/libiscsi.c
index 553168a..94a8046 100644
--- a/drivers/scsi/libiscsi.c
+++ b/drivers/scsi/libiscsi.c
@@ -137,6 +137,45 @@ static int iscsi_add_hdr(struct iscsi_cmd_task *ctask, 
unsigned len)
return 0;
 }
 
+/*
+ * make an extended cdb AHS
+ */
+static int iscsi_prep_ecdb_ahs(struct iscsi_cmd_task *ctask)
+{
+   struct scsi_cmnd *cmd = ctask-sc;
+   unsigned rlen, pad_len;
+   unsigned short ahslength;
+   struct iscsi_ecdb_ahdr *ecdb_ahdr;
+   int rc;
+
+   ecdb_ahdr = iscsi_next_hdr(ctask);
+   rlen = cmd-cmd_len - ISCSI_CDB_SIZE;
+
+   BUG_ON(rlen  sizeof(ecdb_ahdr-ecdb));
+   ahslength = rlen + sizeof(ecdb_ahdr-reserved);
+
+   pad_len = iscsi_padding(rlen);
+
+   rc = iscsi_add_hdr(ctask, sizeof(ecdb_ahdr-ahslength) +
+  sizeof(ecdb_ahdr-ahstype) + ahslength + pad_len);
+   if (rc)
+   return rc;
+
+   if (pad_len)
+   memset(ecdb_ahdr-ecdb[rlen], 0, pad_len);
+
+   ecdb_ahdr-ahslength = cpu_to_be16(ahslength);
+   ecdb_ahdr-ahstype = ISCSI_AHSTYPE_CDB;
+   ecdb_ahdr-reserved = 0;
+   memcpy(ecdb_ahdr-ecdb, cmd-cmnd + ISCSI_CDB_SIZE, rlen);
+
+   debug_scsi(iscsi_prep_ecdb_ahs: varlen_cdb_len %d 
+   rlen %d pad_len %d ahs_length %d iscsi_headers_size %u\n,
+   cmd-cmd_len, rlen, pad_len, ahslength, ctask-hdr_len);
+
+   return 0;
+}
+
 /**
  * iscsi_prep_scsi_cmd_pdu - prep iscsi scsi cmd pdu
  * @ctask: iscsi cmd task
@@ -150,7 +189,7 @@ static int iscsi_prep_scsi_cmd_pdu(struct iscsi_cmd_task 
*ctask)
struct iscsi_session *session = conn-session;
struct iscsi_cmd *hdr = ctask-hdr;
struct scsi_cmnd *sc = ctask-sc;
-   unsigned hdrlength;
+   unsigned hdrlength, cmd_len;
int rc;
 
ctask-hdr_len = 0;
@@ -165,10 +204,17 @@ static int iscsi_prep_scsi_cmd_pdu(struct iscsi_cmd_task 
*ctask)
hdr-cmdsn = cpu_to_be32(session-cmdsn);
session-cmdsn++;
hdr-exp_statsn = cpu_to_be32(conn-exp_statsn);
-   memcpy(hdr-cdb, sc-cmnd, sc-cmd_len);
-   if (sc-cmd_len  MAX_COMMAND_SIZE)
-   memset(hdr-cdb[sc-cmd_len], 0,
-   MAX_COMMAND_SIZE - sc-cmd_len);
+   cmd_len = sc-cmd_len;
+   if (cmd_len  ISCSI_CDB_SIZE)
+   memset(hdr-cdb[cmd_len], 0,
+   ISCSI_CDB_SIZE - cmd_len);
+   else if (cmd_len  ISCSI_CDB_SIZE) {
+   rc = iscsi_prep_ecdb_ahs(ctask);
+   if (rc)
+   return rc;
+   cmd_len = ISCSI_CDB_SIZE;
+   }
+   memcpy(hdr-cdb, sc-cmnd, cmd_len);
 
ctask-imm_count = 0;
if (sc-sc_data_direction == DMA_TO_DEVICE) {
diff --git a/include/scsi/iscsi_proto.h b/include/scsi/iscsi_proto.h
index 318a909..8c591ac 100644
--- a/include/scsi/iscsi_proto.h
+++ b/include/scsi/iscsi_proto.h
@@ -112,6 +112,7 @@ struct iscsi_ahs_hdr {
 
 #define ISCSI_AHSTYPE_CDB  1
 #define ISCSI_AHSTYPE_RLENGTH  2
+#define ISCSI_CDB_SIZE 16
 
 /* iSCSI PDU Header */
 struct iscsi_cmd {
@@ -125,7 +126,7 @@ struct iscsi_cmd {
__be32 data_length;
__be32 cmdsn;
__be32 exp_statsn;
-   uint8_t cdb[16];/* SCSI Command Block */
+   uint8_t cdb[ISCSI_CDB_SIZE];/* SCSI Command Block */
/* Additional Data (Command Dependent) */
 };
 
@@ -154,7 +155,8 @@ struct iscsi_ecdb_ahdr {
__be16 ahslength;   /* CDB length - 15, including reserved byte */
uint8_t ahstype;
uint8_t reserved;
-   uint8_t ecdb[260 - 16]; /* 4-byte aligned extended CDB spillover */
+   /* 4-byte aligned extended CDB spillover */
+   uint8_t ecdb[260 - ISCSI_CDB_SIZE];
 };
 
 /* SCSI Response Header */
-- 
1.5.3.6

-
To unsubscribe from this list: send

[PATCH 4/5] bsg: bidi fix

2007-12-22 Thread Pete Wyckoff
Fixes for bsg to handle bidirectional commands.  The next_rq part
of a bidi command must be marked REQ_TYPE_BLOCK_PC for scsi_init_sgtable.

Signed-off-by: Pete Wyckoff [EMAIL PROTECTED]
---
 block/bsg.c |1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/block/bsg.c b/block/bsg.c
index 0d9364d..fc3a024 100644
--- a/block/bsg.c
+++ b/block/bsg.c
@@ -291,6 +291,7 @@ bsg_map_hdr(struct bsg_device *bd, struct sg_io_v4 *hdr)
goto out;
}
rq-next_rq = next_rq;
+   next_rq-cmd_type = REQ_TYPE_BLOCK_PC;
 
dxferp = (void*)(unsigned long)hdr-din_xferp;
ret =  blk_rq_map_user(q, next_rq, dxferp, hdr-din_xfer_len);
-- 
1.5.3.6

-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/5] iscsi: extended cdb support

2007-12-22 Thread Pete Wyckoff
[Resend, date was in ancient history.]

From: Boaz Harrosh [EMAIL PROTECTED]

  Once varlen cdbs are supported by the block and scsi-ml layers
  we can apply this patch to support extended CDBs in iscsi.

Signed-off-by: Boaz Harrosh [EMAIL PROTECTED]
Signed-off-by: Pete Wyckoff [EMAIL PROTECTED]
---
 drivers/scsi/iscsi_tcp.c   |2 +-
 drivers/scsi/libiscsi.c|   56 
 include/scsi/iscsi_proto.h |6 +++-
 3 files changed, 56 insertions(+), 8 deletions(-)

diff --git a/drivers/scsi/iscsi_tcp.c b/drivers/scsi/iscsi_tcp.c
index e5be5fd..9fde5ce 100644
--- a/drivers/scsi/iscsi_tcp.c
+++ b/drivers/scsi/iscsi_tcp.c
@@ -1973,7 +1973,7 @@ static struct iscsi_transport iscsi_tcp_transport = {
.host_template  = iscsi_sht,
.conndata_size  = sizeof(struct iscsi_conn),
.max_conn   = 1,
-   .max_cmd_len= 16,
+   .max_cmd_len= SCSI_MAX_VARLEN_CDB_SIZE,
/* session management */
.create_session = iscsi_tcp_session_create,
.destroy_session= iscsi_tcp_session_destroy,
diff --git a/drivers/scsi/libiscsi.c b/drivers/scsi/libiscsi.c
index 553168a..94a8046 100644
--- a/drivers/scsi/libiscsi.c
+++ b/drivers/scsi/libiscsi.c
@@ -137,6 +137,45 @@ static int iscsi_add_hdr(struct iscsi_cmd_task *ctask, 
unsigned len)
return 0;
 }
 
+/*
+ * make an extended cdb AHS
+ */
+static int iscsi_prep_ecdb_ahs(struct iscsi_cmd_task *ctask)
+{
+   struct scsi_cmnd *cmd = ctask-sc;
+   unsigned rlen, pad_len;
+   unsigned short ahslength;
+   struct iscsi_ecdb_ahdr *ecdb_ahdr;
+   int rc;
+
+   ecdb_ahdr = iscsi_next_hdr(ctask);
+   rlen = cmd-cmd_len - ISCSI_CDB_SIZE;
+
+   BUG_ON(rlen  sizeof(ecdb_ahdr-ecdb));
+   ahslength = rlen + sizeof(ecdb_ahdr-reserved);
+
+   pad_len = iscsi_padding(rlen);
+
+   rc = iscsi_add_hdr(ctask, sizeof(ecdb_ahdr-ahslength) +
+  sizeof(ecdb_ahdr-ahstype) + ahslength + pad_len);
+   if (rc)
+   return rc;
+
+   if (pad_len)
+   memset(ecdb_ahdr-ecdb[rlen], 0, pad_len);
+
+   ecdb_ahdr-ahslength = cpu_to_be16(ahslength);
+   ecdb_ahdr-ahstype = ISCSI_AHSTYPE_CDB;
+   ecdb_ahdr-reserved = 0;
+   memcpy(ecdb_ahdr-ecdb, cmd-cmnd + ISCSI_CDB_SIZE, rlen);
+
+   debug_scsi(iscsi_prep_ecdb_ahs: varlen_cdb_len %d 
+   rlen %d pad_len %d ahs_length %d iscsi_headers_size %u\n,
+   cmd-cmd_len, rlen, pad_len, ahslength, ctask-hdr_len);
+
+   return 0;
+}
+
 /**
  * iscsi_prep_scsi_cmd_pdu - prep iscsi scsi cmd pdu
  * @ctask: iscsi cmd task
@@ -150,7 +189,7 @@ static int iscsi_prep_scsi_cmd_pdu(struct iscsi_cmd_task 
*ctask)
struct iscsi_session *session = conn-session;
struct iscsi_cmd *hdr = ctask-hdr;
struct scsi_cmnd *sc = ctask-sc;
-   unsigned hdrlength;
+   unsigned hdrlength, cmd_len;
int rc;
 
ctask-hdr_len = 0;
@@ -165,10 +204,17 @@ static int iscsi_prep_scsi_cmd_pdu(struct iscsi_cmd_task 
*ctask)
hdr-cmdsn = cpu_to_be32(session-cmdsn);
session-cmdsn++;
hdr-exp_statsn = cpu_to_be32(conn-exp_statsn);
-   memcpy(hdr-cdb, sc-cmnd, sc-cmd_len);
-   if (sc-cmd_len  MAX_COMMAND_SIZE)
-   memset(hdr-cdb[sc-cmd_len], 0,
-   MAX_COMMAND_SIZE - sc-cmd_len);
+   cmd_len = sc-cmd_len;
+   if (cmd_len  ISCSI_CDB_SIZE)
+   memset(hdr-cdb[cmd_len], 0,
+   ISCSI_CDB_SIZE - cmd_len);
+   else if (cmd_len  ISCSI_CDB_SIZE) {
+   rc = iscsi_prep_ecdb_ahs(ctask);
+   if (rc)
+   return rc;
+   cmd_len = ISCSI_CDB_SIZE;
+   }
+   memcpy(hdr-cdb, sc-cmnd, cmd_len);
 
ctask-imm_count = 0;
if (sc-sc_data_direction == DMA_TO_DEVICE) {
diff --git a/include/scsi/iscsi_proto.h b/include/scsi/iscsi_proto.h
index 318a909..8c591ac 100644
--- a/include/scsi/iscsi_proto.h
+++ b/include/scsi/iscsi_proto.h
@@ -112,6 +112,7 @@ struct iscsi_ahs_hdr {
 
 #define ISCSI_AHSTYPE_CDB  1
 #define ISCSI_AHSTYPE_RLENGTH  2
+#define ISCSI_CDB_SIZE 16
 
 /* iSCSI PDU Header */
 struct iscsi_cmd {
@@ -125,7 +126,7 @@ struct iscsi_cmd {
__be32 data_length;
__be32 cmdsn;
__be32 exp_statsn;
-   uint8_t cdb[16];/* SCSI Command Block */
+   uint8_t cdb[ISCSI_CDB_SIZE];/* SCSI Command Block */
/* Additional Data (Command Dependent) */
 };
 
@@ -154,7 +155,8 @@ struct iscsi_ecdb_ahdr {
__be16 ahslength;   /* CDB length - 15, including reserved byte */
uint8_t ahstype;
uint8_t reserved;
-   uint8_t ecdb[260 - 16]; /* 4-byte aligned extended CDB spillover */
+   /* 4-byte aligned extended CDB spillover */
+   uint8_t ecdb[260 - ISCSI_CDB_SIZE];
 };
 
 /* SCSI Response Header */
-- 
1.5.3.6

[PATCH 1/3] bsg bidi block pc

2007-08-15 Thread Pete Wyckoff
Set cmd_type on rq-next_rq to BLOCK_PC so that scsi_init_sgtable
knows to look at req-data_len rather than nr_sectors.

Signed-off-by: Pete Wyckoff [EMAIL PROTECTED]
---
 block/bsg.c |1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/block/bsg.c b/block/bsg.c
index ba4353a..eb0aaf4 100644
--- a/block/bsg.c
+++ b/block/bsg.c
@@ -278,6 +278,7 @@ bsg_map_hdr(struct bsg_device *bd, struct sg_io_v4 *hdr)
ret = -ENOMEM;
goto out;
}
+   next_rq-cmd_type = REQ_TYPE_BLOCK_PC;
rq-next_rq = next_rq;
 
dxferp = (void*)(unsigned long)hdr-din_xferp;
-- 
1.5.2.4

-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/3] bsg bidirectional and variable length commands

2007-08-15 Thread Pete Wyckoff
There are three patches here to enable using BSG to send SCSI
commands across iscsi TCP that are bidirectional and/or use
variable length CDBs.  They sit on top of 2.6.23-rc2 plus Mike's
iscsi git plus the 12 core patches that Boaz has for bidirectional
support.

They apply to stock 2.6.23-rc2 but I don't think it is worth
anybody's time to take them yet.  Instead, I think it makes sense
for Boaz to hang onto these and submit them along as part of
the bigger bidirectional support picture.

-- Pete

-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/3] iscsi tcp queue bidi

2007-08-15 Thread Pete Wyckoff
Mark queue_flags that bidirectional is acceptable for iscsi_tcp, as
BSG will check to make sure this bit is set.

Signed-off-by: Pete Wyckoff [EMAIL PROTECTED]
---
 drivers/scsi/iscsi_tcp.c |7 +++
 1 files changed, 7 insertions(+), 0 deletions(-)

diff --git a/drivers/scsi/iscsi_tcp.c b/drivers/scsi/iscsi_tcp.c
index 35dd19f..8622930 100644
--- a/drivers/scsi/iscsi_tcp.c
+++ b/drivers/scsi/iscsi_tcp.c
@@ -2205,6 +2205,12 @@ static void iscsi_tcp_session_destroy(struct 
iscsi_cls_session *cls_session)
iscsi_session_teardown(cls_session);
 }
 
+static int iscsi_tcp_slave_alloc(struct scsi_device *sdev)
+{
+   set_bit(QUEUE_FLAG_BIDI, sdev-request_queue-queue_flags);
+   return 0;
+}
+
 static int iscsi_tcp_slave_configure(struct scsi_device *sdev)
 {
blk_queue_bounce_limit(sdev-request_queue, BLK_BOUNCE_ANY);
@@ -2224,6 +2230,7 @@ static struct scsi_host_template iscsi_sht = {
.eh_abort_handler   = iscsi_eh_abort,
.eh_host_reset_handler  = iscsi_eh_host_reset,
.use_clustering = DISABLE_CLUSTERING,
+   .slave_alloc= iscsi_tcp_slave_alloc,
.slave_configure= iscsi_tcp_slave_configure,
.proc_name  = iscsi_tcp,
.this_id= -1,
-- 
1.5.2.4

-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/3] varlen bsg submit

2007-08-15 Thread Pete Wyckoff
Accept variable length SCSI commands through BSG.

Signed-off-by: Pete Wyckoff [EMAIL PROTECTED]
---
 block/bsg.c |   19 +++
 1 files changed, 15 insertions(+), 4 deletions(-)

diff --git a/block/bsg.c b/block/bsg.c
index eb0aaf4..c72b4f9 100644
--- a/block/bsg.c
+++ b/block/bsg.c
@@ -175,11 +175,22 @@ unlock:
 static int blk_fill_sgv4_hdr_rq(struct request_queue *q, struct request *rq,
struct sg_io_v4 *hdr, int has_write_perm)
 {
+   int len = hdr-request_len;
+
memset(rq-cmd, 0, BLK_MAX_CDB); /* ATAPI hates garbage after CDB */
 
if (copy_from_user(rq-cmd, (void *)(unsigned long)hdr-request,
-  hdr-request_len))
+  min(len, BLK_MAX_CDB)))
return -EFAULT;
+   if (len  BLK_MAX_CDB) {
+   rq-varlen_cdb_len = len;
+   rq-varlen_cdb = kmalloc(len, GFP_KERNEL);
+   if (rq-varlen_cdb == NULL)
+   return -ENOMEM;
+   if (copy_from_user(rq-varlen_cdb,
+  (void *)(unsigned long)hdr-request, len))
+   return -EFAULT;
+   }
 
if (hdr-subprotocol == BSG_SUB_PROTOCOL_SCSI_CMD) {
if (blk_verify_command(rq-cmd, has_write_perm))
@@ -190,7 +201,7 @@ static int blk_fill_sgv4_hdr_rq(struct request_queue *q, 
struct request *rq,
/*
 * fill in request structure
 */
-   rq-cmd_len = hdr-request_len;
+   rq-cmd_len = min(len, BLK_MAX_CDB);
rq-cmd_type = REQ_TYPE_BLOCK_PC;
 
rq-timeout = (hdr-timeout * HZ) / 1000;
@@ -212,8 +223,6 @@ bsg_validate_sgv4_hdr(struct request_queue *q, struct 
sg_io_v4 *hdr, int *rw)
 
if (hdr-guard != 'Q')
return -EINVAL;
-   if (hdr-request_len  BLK_MAX_CDB)
-   return -EINVAL;
if (hdr-dout_xfer_len  (q-max_sectors  9) ||
hdr-din_xfer_len  (q-max_sectors  9))
return -EIO;
@@ -303,6 +312,7 @@ bsg_map_hdr(struct bsg_device *bd, struct sg_io_v4 *hdr)
}
return rq;
 out:
+   kfree(rq-varlen_cdb);
blk_put_request(rq);
if (next_rq) {
blk_rq_unmap_user(next_rq-bio);
@@ -443,6 +453,7 @@ static int blk_complete_sgv4_hdr_rq(struct request *rq, 
struct sg_io_v4 *hdr,
}
 
blk_rq_unmap_user(bio);
+   kfree(rq-varlen_cdb);
blk_put_request(rq);
 
return ret;
-- 
1.5.2.4

-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Announcing new open source iSER (iSCSI/RDMA) target

2007-07-30 Thread Pete Wyckoff
We are releasing code to add support for iSCSI Extensions for RDMA
(iSER) to the existing STGT user space SCSI target.  It uses
OpenFabrics libraries and kernel drivers to act as a SCSI target
over RDMA-capable devices.  The code has been tested against
the existing Linux iSER initiator over InfiniBand cards, but
should be specification compliant and work generally.

A bit of documentation is included, and a short technical report is
available at http://www.osc.edu/~pw/papers/iser-techreport.pdf .
For performance, a single SCSI client using iSCSI over gigabit
ethernet does 100 MB/s.  iSCSI over IPoIB gets 200 MB/s, and iSER
over native IB sees 500 MB/s.

More information on STGT is available at http://stgt.berlios.de .

The seven iSER patches can be downloaded from:

git://git.osc.edu/tgt

or browsed at:

http://git.osc.edu/?p=tgt.git;a=summary

New and modified files are distributed under a GPLv2 license.  I'll
submit individual patches to stgt-devel for review and eventual
inclusion in STGT.

-- Pete

-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/4] bsg: add SCSI transport-level request support

2007-06-09 Thread Pete Wyckoff
[EMAIL PROTECTED] wrote on Sat, 09 Jun 2007 00:12 +0900:
 SCSI transport-level requests such as SAS management protocol (SMP)
 skip blk_verify_command().
 
 Signed-off-by: FUJITA Tomonori [EMAIL PROTECTED]
 ---
  block/bsg.c |   27 +--
  include/linux/bsg.h |6 ++
  2 files changed, 27 insertions(+), 6 deletions(-)
 
 diff --git a/block/bsg.c b/block/bsg.c
 index 13ecc95..f2a9979 100644
 --- a/block/bsg.c
 +++ b/block/bsg.c
 @@ -208,7 +208,11 @@ static int blk_fill_sgv4_hdr_rq(request_
   if (copy_from_user(rq-cmd, (void *)(unsigned long)hdr-request,
  hdr-request_len))
   return -EFAULT;
 - if (blk_verify_command(rq-cmd, has_write_perm))
 +
 + if (hdr-subprotocol == BSG_SUB_PROTOCOL_SCSI_CMD) {
 + if (blk_verify_command(rq-cmd, has_write_perm))
 + return -EPERM;
 + } else if (capable(CAP_SYS_RAWIO))
   return -EPERM;

Do you mean !capable here?

-- Pete
-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH resend] bsg: return SAM device status code

2007-04-24 Thread Pete Wyckoff
Use the status codes from the spec, not the shifted-by-one codes
that are marked deprecated in scsi.h  This makes bsg v4 status
report the same value as sg v3 status too.

Signed-off-by: Pete Wyckoff [EMAIL PROTECTED]
Acked-by: Douglas Gilbert [EMAIL PROTECTED]
---
 block/bsg.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/block/bsg.c b/block/bsg.c
index f0753c0..92be6fa 100644
--- a/block/bsg.c
+++ b/block/bsg.c
@@ -440,7 +440,7 @@ static int blk_complete_sgv4_hdr_rq(struct request *rq, 
struct sg_io_v4 *hdr,
/*
 * fill in all the output members
 */
-   hdr-device_status = status_byte(rq-errors);
+   hdr-device_status = rq-errors  0xff;
hdr-transport_status = host_byte(rq-errors);
hdr-driver_status = driver_byte(rq-errors);
hdr-info = 0;
-- 
1.5.0.6

-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] bsg: port to bidi

2007-04-24 Thread Pete Wyckoff
Changes required to make bsg patches work on top of bidi patches.  Adds
capability to bsg to handle bidirectional commands and extended CDBs.

Signed-off-by: Pete Wyckoff [EMAIL PROTECTED]
---
 block/bsg.c|  106 ++--
 block/ll_rw_blk.c  |   30 +-
 include/linux/blkdev.h |1 +
 3 files changed, 87 insertions(+), 50 deletions(-)

diff --git a/block/bsg.c b/block/bsg.c
index 92be6fa..9d09505 100644
--- a/block/bsg.c
+++ b/block/bsg.c
@@ -95,6 +95,7 @@ struct bsg_command {
struct list_head list;
struct request *rq;
struct bio *bio;
+   struct bio *bidi_read_bio;
int err;
struct sg_io_v4 hdr;
struct sg_io_v4 __user *uhdr;
@@ -225,18 +226,31 @@ static struct bsg_command *bsg_get_command(struct 
bsg_device *bd)
 static int blk_fill_sgv4_hdr_rq(request_queue_t *q, struct request *rq,
struct sg_io_v4 *hdr, int has_write_perm)
 {
+   int len = hdr-request_len;
+   int cmd_len = min(len, BLK_MAX_CDB);
+
memset(rq-cmd, 0, BLK_MAX_CDB); /* ATAPI hates garbage after CDB */
 
if (copy_from_user(rq-cmd, (void *)(unsigned long)hdr-request,
-  hdr-request_len))
+  cmd_len))
return -EFAULT;
+   if (len  BLK_MAX_CDB) {
+   rq-varlen_cdb_len = len;
+   rq-varlen_cdb = kmalloc(len, GFP_KERNEL);
+   if (rq-varlen_cdb == NULL)
+   return -ENOMEM;
+   if (copy_from_user(rq-varlen_cdb,
+  (void *)(unsigned long)hdr-request, len))
+   return -EFAULT;
+   }
+
if (blk_verify_command(rq-cmd, has_write_perm))
return -EPERM;
 
/*
 * fill in request structure
 */
-   rq-cmd_len = hdr-request_len;
+   rq-cmd_len = cmd_len;
rq-cmd_type = REQ_TYPE_BLOCK_PC;
 
rq-timeout = (hdr-timeout * HZ) / 1000;
@@ -252,12 +266,11 @@ static int blk_fill_sgv4_hdr_rq(request_queue_t *q, 
struct request *rq,
  * Check if sg_io_v4 from user is allowed and valid
  */
 static int
-bsg_validate_sgv4_hdr(request_queue_t *q, struct sg_io_v4 *hdr, int *rw)
+bsg_validate_sgv4_hdr(request_queue_t *q, struct sg_io_v4 *hdr,
+  enum dma_data_direction *dir)
 {
if (hdr-guard != 'Q')
return -EINVAL;
-   if (hdr-request_len  BLK_MAX_CDB)
-   return -EINVAL;
if (hdr-dout_xfer_len  (q-max_sectors  9) ||
hdr-din_xfer_len  (q-max_sectors  9))
return -EIO;
@@ -266,17 +279,15 @@ bsg_validate_sgv4_hdr(request_queue_t *q, struct sg_io_v4 
*hdr, int *rw)
if (hdr-protocol || hdr-subprotocol)
return -EINVAL;
 
-   /*
-* looks sane, if no data then it should be fine from our POV
-*/
-   if (!hdr-dout_xfer_len  !hdr-din_xfer_len)
-   return 0;
-
-   /* not supported currently */
-   if (hdr-dout_xfer_len  hdr-din_xfer_len)
-   return -EINVAL;
-
-   *rw = hdr-dout_xfer_len ? WRITE : READ;
+   if (hdr-dout_xfer_len) {
+   if (hdr-din_xfer_len)
+   *dir = DMA_BIDIRECTIONAL;
+   else
+   *dir = DMA_TO_DEVICE;
+   } else if (hdr-din_xfer_len)
+   *dir = DMA_FROM_DEVICE;
+   else
+   *dir = DMA_NONE;
 
return 0;
 }
@@ -289,7 +300,8 @@ bsg_map_hdr(struct bsg_device *bd, struct sg_io_v4 *hdr)
 {
request_queue_t *q = bd-queue;
struct request *rq;
-   int ret, rw = 0; /* shut up gcc */
+   enum dma_data_direction dir;
+   int ret;
unsigned int dxfer_len;
void *dxferp = NULL;
 
@@ -297,39 +309,45 @@ bsg_map_hdr(struct bsg_device *bd, struct sg_io_v4 *hdr)
hdr-dout_xfer_len, (unsigned long long) hdr-din_xferp,
hdr-din_xfer_len);
 
-   ret = bsg_validate_sgv4_hdr(q, hdr, rw);
+   ret = bsg_validate_sgv4_hdr(q, hdr, dir);
if (ret)
return ERR_PTR(ret);
 
/*
 * map scatter-gather elements seperately and string them to request
 */
-   rq = blk_get_request(q, rw, GFP_KERNEL);
+   rq = blk_get_request(q, dir, GFP_KERNEL);
ret = blk_fill_sgv4_hdr_rq(q, rq, hdr, test_bit(BSG_F_WRITE_PERM,
   bd-flags));
-   if (ret) {
-   blk_put_request(rq);
-   return ERR_PTR(ret);
-   }
+   if (ret)
+   goto errout;
 
if (hdr-dout_xfer_len) {
dxfer_len = hdr-dout_xfer_len;
dxferp = (void*)(unsigned long)hdr-dout_xferp;
-   } else if (hdr-din_xfer_len) {
+   ret = blk_rq_map_user_bidi(q, rq, dxferp, dxfer_len, dir);
+   if (ret)
+   goto errout

[PATCH 1/2] bsg: compile with bidi

2007-04-24 Thread Pete Wyckoff
Fixes to common functions added with bsg so that they compile in
a kernel that has the bidi patches.

Signed-off-by: Pete Wyckoff [EMAIL PROTECTED]
---
 block/bsg.c|6 +++---
 block/scsi_ioctl.c |6 +++---
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/block/bsg.c b/block/bsg.c
index a333c93..f0753c0 100644
--- a/block/bsg.c
+++ b/block/bsg.c
@@ -368,7 +368,7 @@ static void bsg_add_command(struct bsg_device *bd, 
request_queue_t *q,
 * add bc command to busy queue and submit rq for io
 */
bc-rq = rq;
-   bc-bio = rq-bio;
+   bc-bio = rq_uni(rq)-bio;
bc-hdr.duration = jiffies;
spin_lock_irq(bd-lock);
list_add_tail(bc-list, bd-busy_list);
@@ -446,7 +446,7 @@ static int blk_complete_sgv4_hdr_rq(struct request *rq, 
struct sg_io_v4 *hdr,
hdr-info = 0;
if (hdr-device_status || hdr-transport_status || hdr-driver_status)
hdr-info |= SG_INFO_CHECK;
-   hdr-din_resid = rq-data_len;
+   hdr-din_resid = rq_uni(rq)-data_len;
hdr-response_len = 0;
 
if (rq-sense_len  hdr-response) {
@@ -915,7 +915,7 @@ bsg_ioctl(struct inode *inode, struct file *file, unsigned 
int cmd,
if (IS_ERR(rq))
return PTR_ERR(rq);
 
-   bio = rq-bio;
+   bio = rq_uni(rq)-bio;
blk_execute_rq(bd-queue, NULL, rq, 0);
blk_complete_sgv4_hdr_rq(rq, hdr, bio);
 
diff --git a/block/scsi_ioctl.c b/block/scsi_ioctl.c
index cb29ea1..c1cfae9 100644
--- a/block/scsi_ioctl.c
+++ b/block/scsi_ioctl.c
@@ -244,7 +244,7 @@ EXPORT_SYMBOL_GPL(blk_fill_sghdr_rq);
  */
 int blk_unmap_sghdr_rq(struct request *rq, struct sg_io_hdr *hdr)
 {
-   blk_rq_unmap_user(rq-bio);
+   blk_rq_unmap_user(rq_uni(rq)-bio);
blk_put_request(rq);
return 0;
 }
@@ -266,7 +266,7 @@ int blk_complete_sghdr_rq(struct request *rq, struct 
sg_io_hdr *hdr,
hdr-info = 0;
if (hdr-masked_status || hdr-host_status || hdr-driver_status)
hdr-info |= SG_INFO_CHECK;
-   hdr-resid = rq-data_len;
+   hdr-resid = rq_uni(rq)-data_len;
hdr-sb_len_wr = 0;
 
if (rq-sense_len  hdr-sbp) {
@@ -278,7 +278,7 @@ int blk_complete_sghdr_rq(struct request *rq, struct 
sg_io_hdr *hdr,
ret = -EFAULT;
}
 
-   rq-bio = bio;
+   rq_uni(rq)-bio = bio;
r = blk_unmap_sghdr_rq(rq, hdr);
if (ret)
r = ret;
-- 
1.5.0.6

-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] bsg: port to bidi

2007-04-24 Thread Pete Wyckoff
[EMAIL PROTECTED] wrote on Tue, 24 Apr 2007 13:27 -0400:
 Changes required to make bsg patches work on top of bidi patches.  Adds
 capability to bsg to handle bidirectional commands and extended CDBs.

Oops.  This is 2/2.  Apply the compile with bidi one first.

-- Pete
-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] bsg: iovec support with compat

2007-03-21 Thread Pete Wyckoff
[EMAIL PROTECTED] wrote on Tue, 20 Mar 2007 03:22 +0900:
 From: Pete Wyckoff [EMAIL PROTECTED]
 Subject: [PATCH v2] bsg: iovec support with compat
 Date: Mon, 19 Mar 2007 14:07:27 -0400
 
  Adding a new bsg_write_compat would be bad.  There is lots of
  parsing and setup before we even get to the possibility of iovec
  data mapping.  Reusing just sg_build_ioctl from compat_ioctl.c is
  also suboptimal as this function is built around the idea of a
  contiguous sg_io_hdr + iovec in userspace.  The function is small
  enough that splitting it into a generic + ioctl-specific part would
  add too much abstraction to be worth it.
  
  Here is the patch to use sg_iovec, with its userspace void * and
  size_t, and the CONFIG_COMPAT code to fixup 32-bit userspace.  I'm
  not fond of having __u64 for non-iovec buffer representations, and
  void * for iovec buffer representations, but it saves having to
  build an sg_iovec just to call into the existing blk_rq_map_user_iov.
  
  Comments?
 
 The compat code should not go to bsg.c.

Agreed.  I dislike the entire approach of reusing sg_iovec in bsg,
but thought I'd work it through to see what it is like.

The compat code cannot go in compat_ioctl.c or similar, as explained
above.  And it seems wrong to leave it out and silently break on
compat-requiring setups.

Using sg_iovec with bsg is bad from a user perspective too.
sg_iovec pointers are (void *).  sg_io_v4 pointers are __u64.
sg_iovec lengths are size_t (64-bit).  sg_io_v4 lengths are __u32.

Please consider instead the original proposal of a new iovec
type for bsg.  It is less complex than using existing sg_iovec.
http://article.gmane.org/gmane.linux.scsi/30461

There is exactly one user of blk_rq_map_user_iov() in the tree:
sg_io() in scsi_ioctl.c.  I could convert that to sg_v4_iovec now
too.  The helper function bio_map_user_iov() is only used by
blk_rq_map_user_iov() and can also be fixed.  The only use we have
for struct sg_iovec is this one sg_io() caller, and sg.c.  Let's
migrate to sg_v4_iovec at the same time we switch from sgv3 to sgv4.

-- Pete
-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2] bsg: iovec support with explicit u64

2007-03-19 Thread Pete Wyckoff
[EMAIL PROTECTED] wrote on Mon, 19 Mar 2007 14:07 -0400:
 Here is the patch to use sg_iovec, with its userspace void * and
 size_t, and the CONFIG_COMPAT code to fixup 32-bit userspace.  I'm
 not fond of having __u64 for non-iovec buffer representations, and
 void * for iovec buffer representations, but it saves having to
 build an sg_iovec just to call into the existing blk_rq_map_user_iov.

Just for comparison, a cleaned up version of the earlier patch that
does not require compat conversion.  Instead, use a new struct
sg_v4_iovec with explicit u64/u32 representation for user pointer
and length.

Perhaps later we might change blk_rq_map_user_iov to take
sg_v4_iovec and let sgv3 convert its sg_iovec into that.

-- Pete


Support vectored IO as in SGv3.  The iovec structure uses explicit
sizes to avoid the need for compat conversion.

Signed-off-by: Pete Wyckoff [EMAIL PROTECTED]
---
 block/bsg.c |   95 ++
 include/linux/bsg.h |   14 +++
 2 files changed, 86 insertions(+), 23 deletions(-)

diff --git a/block/bsg.c b/block/bsg.c
index f1ea258..e334f75 100644
--- a/block/bsg.c
+++ b/block/bsg.c
@@ -280,6 +280,56 @@ bsg_validate_sgv4_hdr(request_queue_t *q, struct sg_io_v4 
*hdr, int *rw)
 }
 
 /*
+ * Map either the in or out bufs.  Convert sg_v4_iovec to sg_iovec
+ * to use existing blk_rq_map infrastructure.  We could do a copy-in-place
+ * conversion on 64-bit kernels, just by zeroing the top half of sg_iovec's
+ * iov_len, but do not, for simplicity.
+ */
+static int bsg_map_data(struct request_queue *q, struct request *rq,
+   __u64 uaddr, __u32 tot_len, __u32 numiov,
+   enum dma_data_direction dir)
+{
+   int i, ret;
+   void __user *ubuf = (void __user *) (unsigned long) uaddr;
+   struct sg_iovec fastiov[8], *iov = fastiov;
+   struct sg_v4_iovec v4_fastiov[8], *v4_iov = v4_fastiov;
+
+   if (numiov == 0) {
+   ret = blk_rq_map_user(q, rq, ubuf, tot_len);
+   goto out;
+   }
+
+   if (numiov = ARRAY_SIZE(fastiov)) {
+   iov = kmalloc(numiov * (sizeof(*iov) + sizeof(*v4_iov)),
+ GFP_KERNEL);
+   if (iov == NULL) {
+   ret = -ENOMEM;
+   goto out;
+   }
+   v4_iov = (void *) iov[numiov];
+   }
+
+   if (copy_from_user(v4_iov, ubuf, numiov * sizeof(*v4_iov))) {
+   ret = -EFAULT;
+   goto outfree;
+   }
+
+   for (i=0; inumiov; i++) {
+   iov[i].iov_base = (void *)(unsigned long) v4_iov[i].iov_base;
+   iov[i].iov_len = v4_iov[i].iov_len;
+   }
+
+   ret = blk_rq_map_user_iov(q, rq, iov, numiov, tot_len);
+
+outfree:
+   if (iov != fastiov)
+   kfree(iov);
+
+out:
+   return ret;
+}
+
+/*
  * map sg_io_v4 to a request.
  */
 static struct request *
@@ -288,12 +338,12 @@ bsg_map_hdr(struct bsg_device *bd, struct sg_io_v4 *hdr)
request_queue_t *q = bd-queue;
struct request *rq;
int ret, rw = 0; /* shut up gcc */
-   unsigned int dxfer_len;
-   void *dxferp = NULL;
 
-   dprintk(map hdr %llx/%u %llx/%u\n, (unsigned long long) 
hdr-dout_xferp,
-   hdr-dout_xfer_len, (unsigned long long) hdr-din_xferp,
-   hdr-din_xfer_len);
+   dprintk(map hdr %llx/%u/%u %llx/%u/%u\n,
+   (unsigned long long) hdr-dout_xferp, hdr-dout_xfer_len,
+   hdr-dout_iovec_count,
+   (unsigned long long) hdr-din_xferp, hdr-din_xfer_len,
+   hdr-din_iovec_count);
 
ret = bsg_validate_sgv4_hdr(q, hdr, rw);
if (ret)
@@ -305,29 +355,28 @@ bsg_map_hdr(struct bsg_device *bd, struct sg_io_v4 *hdr)
rq = blk_get_request(q, rw, GFP_KERNEL);
ret = blk_fill_sgv4_hdr_rq(q, rq, hdr, test_bit(BSG_F_WRITE_PERM,
   bd-flags));
-   if (ret) {
-   blk_put_request(rq);
-   return ERR_PTR(ret);
-   }
+   if (ret)
+   goto errout;
 
if (hdr-dout_xfer_len) {
-   dxfer_len = hdr-dout_xfer_len;
-   dxferp = (void*)(unsigned long)hdr-dout_xferp;
+   ret = bsg_map_data(q, rq, hdr-dout_xferp, hdr-dout_xfer_len,
+  hdr-dout_iovec_count, DMA_TO_DEVICE);
+   if (ret)
+   goto errout;
} else if (hdr-din_xfer_len) {
-   dxfer_len = hdr-din_xfer_len;
-   dxferp = (void*)(unsigned long)hdr-din_xferp;
-   } else
-   dxfer_len = 0;
-
-   if (dxfer_len) {
-   ret = blk_rq_map_user(q, rq, dxferp, dxfer_len);
-   if (ret) {
-   dprintk(failed map at %d\n, ret);
-   blk_put_request(rq);
-   rq = ERR_PTR(ret

Re: SCSI Generic version 4 interface, release 1.2

2007-03-18 Thread Pete Wyckoff
[EMAIL PROTECTED] wrote on Sat, 17 Mar 2007 18:13 -0500:
 Pete Wyckoff wrote:
  [EMAIL PROTECTED] wrote on Wed, 14 Mar 2007 12:18 -0400:
  After reviewing this post by Pete Wyckoff:
  http://marc.theaimsgroup.com/?l=linux-scsim=117278879816029w=2
 
  I decided to update my sg v4 interface document originally
  posted 20061106 which I will now call release 1.1 :
  http://lwn.net/Articles/208082/
 
  Pete was proposing to put back din_iovec_count and
  dout_iovec_count that had been dropped out of bsg but
  had been in release 1.1 . Hmm.
 
  Some other items have been picked up from the bsg
  implementation plus the suggestion from LSF'07 to
  add dout_resid.
 
  See the attachment, comments welcome.
  
  Do you want to define the iovec format too?  As I commented in my
  patch, v3 sg_iovec has pointers with 32/64-bit issues.  Would be
  nice to see you declare v4 sg_iovec as pure u64.  (By the way, don't
  use the patch:  casting from the new to the old can put junk in the
  top half of the 64-bit sg_iovec.iov_len).
  
  Another issue I wonder about is queue DMA alignment.  In bsg,
  blk_rq_map_user will use a bounce buffer if the user-supplied start
  and end addresses are not aligned.  sg will happily map user pages
  at any offset without checking, although I haven't checked if Mike's
  patches change this.  ll_rw_blk.c says regarding blk_rq_map_user:
 
 Currently and before any of my patches, sg checked the start
 if (((unsigned long)hp-dxferp 
 queue_dma_alignment(sdev-request_queue)) != 0)
 It did not check the start and end like blk_rq_map_user.
 
 With my updated patches, blk_rq_map_user is just checking the start.
 There was a thread about not checking the end on linux-scsi. I will try
 to dig it up. A couple kernels ago the bio map helpers were modified to
 only check the start.

Oh, I missed that.  There's this thread about USB needing end
alignment too, on all but the last iovec entry:

http://thread.gmane.org/gmane.linux.usb.devel/16230/focus=16409

-- Pete
-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: SCSI Generic version 4 interface, release 1.2

2007-03-17 Thread Pete Wyckoff
[EMAIL PROTECTED] wrote on Wed, 14 Mar 2007 12:18 -0400:
 After reviewing this post by Pete Wyckoff:
 http://marc.theaimsgroup.com/?l=linux-scsim=117278879816029w=2
 
 I decided to update my sg v4 interface document originally
 posted 20061106 which I will now call release 1.1 :
 http://lwn.net/Articles/208082/
 
 Pete was proposing to put back din_iovec_count and
 dout_iovec_count that had been dropped out of bsg but
 had been in release 1.1 . Hmm.
 
 Some other items have been picked up from the bsg
 implementation plus the suggestion from LSF'07 to
 add dout_resid.
 
 See the attachment, comments welcome.

Do you want to define the iovec format too?  As I commented in my
patch, v3 sg_iovec has pointers with 32/64-bit issues.  Would be
nice to see you declare v4 sg_iovec as pure u64.  (By the way, don't
use the patch:  casting from the new to the old can put junk in the
top half of the 64-bit sg_iovec.iov_len).

Another issue I wonder about is queue DMA alignment.  In bsg,
blk_rq_map_user will use a bounce buffer if the user-supplied start
and end addresses are not aligned.  sg will happily map user pages
at any offset without checking, although I haven't checked if Mike's
patches change this.  ll_rw_blk.c says regarding blk_rq_map_user:

We don't allow misaligned data like bio_map_user() does.  If the
user is using sg, they're expected to know the alignment
constraints and respect them accordingly.

Should this still be true for both iovec and non-iovec uses of sgv4?
I modified bsg to use iovec and ignore alignment issues, just like
sg, but left in the bounce buffer for non-iovec usage.  Seems
awkward.  scsi_ioctl's sg_io has the same odd situation:  non-iovec
is bounced, iovec must be aligned by user.

-- Pete

-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] return valid data with sense

2007-03-13 Thread Pete Wyckoff
Some targets can return both valid data and sense information.
Always update the request data_len from the SCSI command residual.
Callers should interpret sense data to determine what parts of the
data are valid in case of a CHECK CONDITION status.

Signed-off-by: Pete Wyckoff [EMAIL PROTECTED]
---
 drivers/scsi/scsi_lib.c |4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 5f95570..be8e655 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -848,8 +848,8 @@ void scsi_io_completion(struct scsi_cmnd *cmd, unsigned int 
good_bytes)
memcpy(req-sense, cmd-sense_buffer,  len);
req-sense_len = len;
}
-   } else
-   req-data_len = cmd-resid;
+   }
+   req-data_len = cmd-resid;
}
 
/*
-- 
1.5.0.2

-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] block queue dma alignment zero

2007-03-01 Thread Pete Wyckoff
Make queue_dma_alignment return 0 if it was specifically set to
0.  Set it to the default 511 to keep the old behavior when it
was not explicitly set.  This permits devices with no particular
alignment restrictions to use direct IO from arbitrary addresses.

Signed-off-by: Pete Wyckoff [EMAIL PROTECTED]
---
 block/ll_rw_blk.c  |2 ++
 include/linux/blkdev.h |7 +--
 2 files changed, 3 insertions(+), 6 deletions(-)

diff --git a/block/ll_rw_blk.c b/block/ll_rw_blk.c
index fb67897..ef1d1a8 100644
--- a/block/ll_rw_blk.c
+++ b/block/ll_rw_blk.c
@@ -1925,6 +1925,8 @@ blk_init_queue_node(request_fn_proc *rfn, spinlock_t 
*lock, int node_id)
blk_queue_max_hw_segments(q, MAX_HW_SEGMENTS);
blk_queue_max_phys_segments(q, MAX_PHYS_SEGMENTS);
 
+   blk_queue_dma_alignment(q, 511);
+
/*
 * all done
 */
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 36a6eac..f416c2a 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -819,12 +819,7 @@ static inline int bdev_hardsect_size(struct block_device 
*bdev)
 
 static inline int queue_dma_alignment(request_queue_t *q)
 {
-   int retval = 511;
-
-   if (q  q-dma_alignment)
-   retval = q-dma_alignment;
-
-   return retval;
+   return q ? q-dma_alignment : 511;
 }
 
 /* assumes size  256 */
-- 
1.5.0.2

-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] iscsi tcp set queue dma alignment to zero

2007-03-01 Thread Pete Wyckoff
Add a slave_configure function to iSCSI TCP to remove any DMA
alignment restriction.  This permits the use of direct IO from
arbitrary addresses.

Signed-off-by: Pete Wyckoff [EMAIL PROTECTED]
---
 drivers/scsi/iscsi_tcp.c |   11 +++
 1 files changed, 11 insertions(+), 0 deletions(-)

diff --git a/drivers/scsi/iscsi_tcp.c b/drivers/scsi/iscsi_tcp.c
index 4376840..f48eedd 100644
--- a/drivers/scsi/iscsi_tcp.c
+++ b/drivers/scsi/iscsi_tcp.c
@@ -2132,6 +2132,16 @@ static void iscsi_tcp_session_destroy(struct 
iscsi_cls_session *cls_session)
iscsi_session_teardown(cls_session);
 }
 
+/*
+ * New device attached.  Turn off the DMA alignment restriction on
+ * the request queue.
+ */
+static int iscsi_tcp_slave_configure(struct scsi_device *sdev)
+{
+   blk_queue_dma_alignment(sdev-request_queue, 0);
+   return 0;
+}
+
 static struct scsi_host_template iscsi_sht = {
.name   = iSCSI Initiator over TCP/IP,
.queuecommand   = iscsi_queuecommand,
@@ -2142,6 +2152,7 @@ static struct scsi_host_template iscsi_sht = {
.eh_abort_handler   = iscsi_eh_abort,
.eh_host_reset_handler  = iscsi_eh_host_reset,
.use_clustering = DISABLE_CLUSTERING,
+   .slave_configure= iscsi_tcp_slave_configure,
.proc_name  = iscsi_tcp,
.this_id= -1,
 };
-- 
1.5.0.2

-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] bsg: iovec support

2007-03-01 Thread Pete Wyckoff
Support vectored IO as in SGv3.  The iovec structure uses explicit
sizes to avoid the need for compat conversion.

Signed-off-by: Pete Wyckoff [EMAIL PROTECTED]
---

My application definitely can take advantage of scatter/gather IO,
which is supported in sgv3 but not in the bsg implementation of sgv4.
I understand Tomo's concerns about code bloat and the need for
32/64 compat translations, but this will make things much easier on
users of bsg who read or write out of multiple buffers in a single
SCSI operation.

Clearly we want to avoid doing the compat work that sg.c has to do
now, so I went with __u64 for the addresses in the structures that
userspace sees.  But to interface with existing bio structures, that
must be converted back to 32-bit pointers in sg_iovec (only on
32-bit architectures).  In the long run, maybe we should have a
bio_map_user_iov() that works on the constant-sized sg_io_v4_vec
proposed here?

-- Pete


 block/bsg.c |  132 ++-
 include/linux/bsg.h |   16 ++
 2 files changed, 125 insertions(+), 23 deletions(-)

diff --git a/block/bsg.c b/block/bsg.c
index c85d961..8e3d6c7 100644
--- a/block/bsg.c
+++ b/block/bsg.c
@@ -280,6 +280,95 @@ bsg_validate_sgv4_hdr(request_queue_t *q, struct sg_io_v4 
*hdr, int *rw)
 }
 
 /*
+ * Sits around blk_rq_map_user_iov so we can use an iovec type that
+ * does not require compat manipulations.  For now we just clumsily
+ * remap the entire iovec if the types do not match.  Later consider
+ * changing the bio map function.
+ */
+static int bsg_map_user_iovec(request_queue_t *q, struct request *rq,
+ struct sg_io_v4_vec *vec, int numvec,
+ size_t tot_len, enum dma_data_direction dir)
+{
+   struct bio *bio;
+   struct sg_iovec *iov;
+   int write_to_vm = (dir == DMA_FROM_DEVICE ? 1 : 0);
+   int must_copy_iovec = (sizeof(*iov) != sizeof(*vec));
+
+   /*
+* For 64-bit everywhere, sg_io_v4_vec using __u64 is same as sg_iovec
+* using void *.  For 64-bit kernel with 32-bit userspace, also no
+* translation needed as userspace is forced to use __u64.  Only in the
+* all 32-bit case will sg_iovec use 32-bit pointers and hence we
+* must shrink our 64-bit pointers down into it.
+*/
+   if (must_copy_iovec) {
+   int i;
+   iov = kmalloc(numvec * sizeof(*iov), GFP_KERNEL);
+   for (i=0; inumvec; i++) {
+   iov[i].iov_base = (void __user *) vec[i].iov_base;
+   iov[i].iov_len = vec[i].iov_len;
+   }
+   } else {
+   iov = (struct sg_iovec *) vec;
+   }
+
+   bio = bio_map_user_iov(q, NULL, iov, numvec, write_to_vm);
+
+   if (must_copy_iovec)
+   kfree(iov);
+
+   if (IS_ERR(bio)) {
+   dprintk(bio_map_user_iov err\n);
+   return PTR_ERR(bio);
+   }
+
+   if (bio-bi_size != tot_len) {
+   dprintk(bio-bi_size %u != len %lu\n, bio-bi_size, tot_len);
+   bio_endio(bio, bio-bi_size, 0);
+   bio_unmap_user(bio);
+   return -EINVAL;
+   }
+
+   bio_get(bio);
+   blk_rq_bio_prep_bidi(q, rq, bio, dir);
+   rq-buffer = rq-data = NULL;
+   return 0;
+}
+
+/*
+ * Map either the in or out bufs.
+ */
+static int bsg_map_data(struct request_queue *q, struct request *rq,
+   __u64 uaddr, __u32 tot_len, __u32 numiov,
+   enum dma_data_direction dir)
+{
+   int ret;
+   void __user *ubuf = (void __user *) (unsigned long) uaddr;
+
+   if (numiov) {
+   struct sg_io_v4_vec *vec;
+   size_t len = numiov * sizeof(*vec);
+
+   vec = kmalloc(len, GFP_KERNEL);
+   if (vec == NULL) {
+   ret = -ENOMEM;
+   goto out;
+   }
+   if (copy_from_user(vec, ubuf, len)) {
+   ret = -EFAULT;
+   kfree(vec);
+   goto out;
+   }
+   ret = bsg_map_user_iovec(q, rq, vec, numiov, tot_len, dir);
+   kfree(vec);
+   } else
+   ret = blk_rq_map_user(q, rq, ubuf, tot_len);
+
+out:
+   return ret;
+}
+
+/*
  * map sg_io_v4 to a request.
  */
 static struct request *
@@ -288,12 +377,10 @@ bsg_map_hdr(struct bsg_device *bd, struct sg_io_v4 *hdr)
request_queue_t *q = bd-queue;
struct request *rq;
int ret, rw = 0; /* shut up gcc */
-   unsigned int dxfer_len;
-   void *dxferp = NULL;
 
-   dprintk(map hdr %llx/%u %llx/%u\n, (unsigned long long) 
hdr-dout_xferp,
-   hdr-dout_xfer_len, (unsigned long long) hdr-din_xferp,
-   hdr-din_xfer_len);
+   dprintk(map hdr %llx/%u %llx/%u\n,
+   (unsigned long long) hdr-dout_xferp, hdr

[PATCH] bsg: return SAM device status code

2007-02-23 Thread Pete Wyckoff
Use the status codes from the standard, not the shifted-by-one codes
that are marked deprecated in scsi.h.  This makes bsg v4 status
report the same value as sg v3 status too.

Signed-off-by: Pete Wyckoff [EMAIL PROTECTED]
---
 block/bsg.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/block/bsg.c b/block/bsg.c
index c85d961..e39a321 100644
--- a/block/bsg.c
+++ b/block/bsg.c
@@ -438,7 +438,7 @@ static int blk_complete_sgv4_hdr_rq(struct request *rq, 
struct sg_io_v4 *hdr,
/*
 * fill in all the output members
 */
-   hdr-device_status = status_byte(rq-errors);
+   hdr-device_status = rq-errors  0xff;
hdr-transport_status = host_byte(rq-errors);
hdr-driver_status = driver_byte(rq-errors);
hdr-info = 0;
-- 
1.4.4.2

-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] bind bsg to request_queue instead of gendisk

2007-02-14 Thread Pete Wyckoff
[EMAIL PROTECTED] wrote on Wed, 14 Feb 2007 02:53 +0900:
 It seems that it would be better to bind bsg devices to request_queue
 instead of gendisk. This enables any objects to define own
 request_handler and create own bsg device (under sysfs).
 
 Possible enhancements:
 
 - I removed gendisk but it would be better for objects having gendisk
 to keep it for nice features like disk stats.
 
 - Objects that wants to use bsg need to setup a request_queue. Maybe
 wrapper functions to setup a request_queue for them would be useful.
 
 This patch was tested only with disk drivers.

The only place that bsg_register_rq is called is via
blk_register_queue, which is only called by add_disk.  But not all
devices have a block interface or need the gendisk that these
functions assume.

This change registers all SCSI devices with bsg, even ones that are
not accessed through a block interface.  They already have a
request_queue; this just presents it to bsg at LUN scan time.
It uses the bus_id as the name for sysfs.

Devices that happen to be block SCSI devices are thus registered
twice, which is probably not good, but preserves the exsiting sda
entry, e.g., in /sys/class/bsg.

Signed-off-by: Pete Wyckoff [EMAIL PROTECTED]
---
 block/bsg.c   |   16 +---
 drivers/scsi/scsi_scan.c  |6 ++
 drivers/scsi/scsi_sysfs.c |1 +
 3 files changed, 20 insertions(+), 3 deletions(-)

diff --git a/block/bsg.c b/block/bsg.c
index e261997..e100efe 100644
--- a/block/bsg.c
+++ b/block/bsg.c
@@ -962,6 +962,7 @@ int bsg_register_rq(struct request_queue *q, char *name)
 {
struct bsg_class_device *bcd;
dev_t dev;
+   int ret = -ENOMEM;
 
/*
 * we need a proper transport to send commands, not a stacked device
@@ -981,9 +982,18 @@ int bsg_register_rq(struct request_queue *q, char *name)
bcd-class_dev = class_device_create(bsg_class, NULL, dev, bcd-dev, 
%s, name);
if (!bcd-class_dev)
goto err;
+
+   /*
+* Put a link, e.g., /sys/block/sda/queue/bsg - /sys/class/bsg/sda,
+* if a sysfs entry for the queue exists.
+*/
+   if (q-kobj.dentry) {
+   ret = sysfs_create_link(q-kobj, bcd-class_dev-kobj, bsg);
+   if (ret)
+   goto err;
+   }
+
list_add_tail(bcd-list, bsg_class_list);
-   if (sysfs_create_link(q-kobj, bcd-class_dev-kobj, bsg))
-   goto err;
mutex_unlock(bsg_mutex);
return 0;
 err:
@@ -991,7 +1001,7 @@ err:
if (bcd-class_dev)
class_device_destroy(bsg_class, MKDEV(BSG_MAJOR, bcd-minor));
mutex_unlock(bsg_mutex);
-   return -ENOMEM;
+   return ret;
 }
 
 static int __init bsg_init(void)
diff --git a/drivers/scsi/scsi_scan.c b/drivers/scsi/scsi_scan.c
index 23ab62e..8aa4c3c 100644
--- a/drivers/scsi/scsi_scan.c
+++ b/drivers/scsi/scsi_scan.c
@@ -895,6 +895,12 @@ static int scsi_add_lun(struct scsi_device *sdev, unsigned 
char *inq_result,
}
 
/*
+* Register bsg interface for every SCSI device.
+*/
+   if (bsg_register_rq(sdev-request_queue, sdev-sdev_gendev.bus_id))
+   return SCSI_SCAN_NO_RESPONSE;
+
+   /*
 * Ok, the device is now all set up, we can
 * register it and tell the rest of the kernel
 * about it.
diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
index 259c90c..bc694a5 100644
--- a/drivers/scsi/scsi_sysfs.c
+++ b/drivers/scsi/scsi_sysfs.c
@@ -238,6 +238,7 @@ static void scsi_device_dev_release_usercontext(struct 
work_struct *work)
spin_unlock_irqrestore(sdev-host-host_lock, flags);
 
if (sdev-request_queue) {
+   bsg_unregister_rq(sdev-request_queue);
sdev-request_queue-queuedata = NULL;
/* user context needed to free queue */
scsi_free_queue(sdev-request_queue);
-- 
1.4.4.2

-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html