libsas: correctly flush the LU queue on error recovery

2008-02-22 Thread James Bottomley
The current sas_scsi_clear_queue_lu() is wrongly checking for commands
which match the pointer to the one passed in.  It should be checking for
commands which are on the same logical unit as the one passed in.  Fix
this by checking target pointer and LUN for equality.

Signed-off-by: James Bottomley <[EMAIL PROTECTED]>

---

diff --git a/drivers/scsi/libsas/sas_scsi_host.c 
b/drivers/scsi/libsas/sas_scsi_host.c
index b796d99..57ba99f 100644
--- a/drivers/scsi/libsas/sas_scsi_host.c
+++ b/drivers/scsi/libsas/sas_scsi_host.c
@@ -282,7 +282,8 @@ static void sas_scsi_clear_queue_lu(struct list_head 
*error_q, struct scsi_cmnd
struct scsi_cmnd *cmd, *n;
 
list_for_each_entry_safe(cmd, n, error_q, eh_entry) {
-   if (cmd == my_cmd)
+   if (cmd->device->sdev_target == my_cmd->device->sdev_target &&
+   cmd->device->lun == my_cmd->device->lun)
sas_eh_finish_cmd(cmd);
}
 }


-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


aic94xx: fix sequencer hang on error recovery

2008-02-22 Thread James Bottomley
The clear nexus I_T and clear nexus I_T_L functions in the aic94xx
specify the SUSPEND_TX flag which causes the sequencer to be suspended
until it receives a RESUME_TX.  Unfortunately, nothing ever sends the
resume, so the sequencer on the link is stopped forever, leading to
eventual timeouts and I/O errors.

Since clear nexus commands are only executed as part of error recovery,
it's perfectly fine to keep the sequencer running on the link ... as
soon as the recovery function is completed, we'll send it the commands
to retry.

Signed-off-by: James Bottomley <[EMAIL PROTECTED]>

---

diff --git a/drivers/scsi/aic94xx/aic94xx_tmf.c 
b/drivers/scsi/aic94xx/aic94xx_tmf.c
index b52124f..144f5ad 100644
--- a/drivers/scsi/aic94xx/aic94xx_tmf.c
+++ b/drivers/scsi/aic94xx/aic94xx_tmf.c
@@ -151,8 +151,6 @@ static int asd_clear_nexus_I_T(struct domain_device *dev)
CLEAR_NEXUS_PRE;
scb->clear_nexus.nexus = NEXUS_I_T;
scb->clear_nexus.flags = SEND_Q | EXEC_Q | NOTINQ;
-   if (dev->tproto)
-   scb->clear_nexus.flags |= SUSPEND_TX;
scb->clear_nexus.conn_handle = cpu_to_le16((u16)(unsigned long)
   dev->lldd_dev);
CLEAR_NEXUS_POST;
@@ -169,8 +167,6 @@ static int asd_clear_nexus_I_T_L(struct domain_device *dev, 
u8 *lun)
CLEAR_NEXUS_PRE;
scb->clear_nexus.nexus = NEXUS_I_T_L;
scb->clear_nexus.flags = SEND_Q | EXEC_Q | NOTINQ;
-   if (dev->tproto)
-   scb->clear_nexus.flags |= SUSPEND_TX;
memcpy(scb->clear_nexus.ssp_task.lun, lun, 8);
scb->clear_nexus.conn_handle = cpu_to_le16((u16)(unsigned long)
   dev->lldd_dev);


-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: New 2.6.24.2 SG_IO SCSI problems

2008-02-22 Thread Tony Battersby

> I attached a backport of the patch from Tony (added as cc) that is in 
> 2.6.25-rc2. Could you try it out against 2.6.24.2 just to make sure it 
> was this patch, then we can send it to stable.
>   

Yes, I had wanted to send this patch to -stable, but got distracted with
other bugs.  So please do so, by all means.

Tony

-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: New 2.6.24.2 SG_IO SCSI problems

2008-02-22 Thread Mike Christie

Mark Hounschell wrote:

Mark Hounschell wrote:

Mike Christie wrote:

Mike Christie wrote:

Mark Hounschell wrote:

I seem to have run into some sort of regression in the SG_IO
interface of 2.6.24.2. I have an application that up until 2.6.24
worked fine. The 2.6.23.16 kernel works fine.

During reads I get these kernel messages. Writes and other functions
_seem_ OK. Actually basic
reads  are working. Its with large BC reads using an io_vec list that
the problem shows up.


Are you doing SG_IO to the sg device (/dev/sg*) or to the block device
(/dev/sdX)?

If you are doing SG_IO to the sg device, then I know of one regression
(well not regression exactly, but I fixed a bug but the patch got
partially overwritten by another patch and that caused a new bug). Both
bugs are fixed in 2.6.25-rc2. Could you try that out if you are doing
SG_IO to the sg device.


Yes, I'm using /dev/sg*. And yes again I'll checkout 2.6.25-rc2 ASIC.

Thanks
Mark
-
 
2.6.25-rc2 does fix the problem I'm having. I don't suppose there is a patch

lying around for 2.6.24.2??



I attached a backport of the patch from Tony (added as cc) that is in 
2.6.25-rc2. Could you try it out against 2.6.24.2 just to make sure it 
was this patch, then we can send it to stable.
Backport
76d78300a6eb8b7f08e47703b7e68a659ffc2053
to 2.6.24

>From Tony Battersby:

When sending a SCSI command to a tape drive via the SCSI Generic (sg)
driver, if the command has a data transfer length more than
scatter_elem_sz (32 KB default) and not a multiple of 512, then I either
hit BUG_ON(!valid_dma_direction(direction)) in dma_unmap_sg() or else
the command never completes (depending on the LLDD).

When constructing scatterlists, the sg driver rounds up the scatterlist
element sizes to be a multiple of 512.  This can result in
sum(scatterlist lengths) > bufflen.  In this case, scsi_req_map_sg()
incorrectly sets bio->bi_size to sum(scatterlist lengths) rather than to
bufflen.  When the command completes, req_bio_endio() detects that
bio->bi_size != 0, and so it doesn't call bio_endio().  This causes the
command to be resubmitted, resulting in BUG_ON or the command never
completing.

This patch makes scsi_req_map_sg() set bio->bi_size to bufflen rather
than to sum(scatterlist lengths), which fixes the problem.

Signed-off-by: Mike Christie <[EMAIL PROTECTED]>

--- linux-2.6.24.2/drivers/scsi/scsi_lib.c	2008-02-10 23:51:11.0 -0600
+++ linux-2.6.24.2.work/drivers/scsi/scsi_lib.c	2008-02-22 16:20:09.0 -0600
@@ -298,7 +298,6 @@ static int scsi_req_map_sg(struct reques
 		page = sg_page(sg);
 		off = sg->offset;
 		len = sg->length;
- 		data_len += len;
 
 		while (len > 0 && data_len > 0) {
 			/*


Re: New 2.6.24.2 SG_IO SCSI problems

2008-02-22 Thread Mark Hounschell
Mark Hounschell wrote:
> Mike Christie wrote:
>> Mike Christie wrote:
>>> Mark Hounschell wrote:
 I seem to have run into some sort of regression in the SG_IO
 interface of 2.6.24.2. I have an application that up until 2.6.24
 worked fine. The 2.6.23.16 kernel works fine.

 During reads I get these kernel messages. Writes and other functions
 _seem_ OK. Actually basic
 reads  are working. Its with large BC reads using an io_vec list that
 the problem shows up.

>>> Are you doing SG_IO to the sg device (/dev/sg*) or to the block device
>>> (/dev/sdX)?
>> If you are doing SG_IO to the sg device, then I know of one regression
>> (well not regression exactly, but I fixed a bug but the patch got
>> partially overwritten by another patch and that caused a new bug). Both
>> bugs are fixed in 2.6.25-rc2. Could you try that out if you are doing
>> SG_IO to the sg device.
>>
> 
> Yes, I'm using /dev/sg*. And yes again I'll checkout 2.6.25-rc2 ASIC.
> 
> Thanks
> Mark
> -
 
2.6.25-rc2 does fix the problem I'm having. I don't suppose there is a patch
lying around for 2.6.24.2??

Thanks
Mark

-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] libsas: fix error handling

2008-02-22 Thread James Bottomley
On Wed, 2008-02-20 at 10:07 -0600, James Bottomley wrote:
> The libsas error handler has two fairly fatal bugs

A bug in this one was picked up in testing.  There's still a double
completion race from the time a task is aborted to the time the eh
actually starts.  If the task returns in that interval it will get
completed twice, so we key off the SAS_TASK_STATE_ABORTED to prevent
this (or at least narrow the race window to being almost untriggerable).

James

---

Subject: [SCSI] libsas: fix error handling

The libsas error handler has two fairly fatal bugs

1. scsi_sas_task_done calls scsi_eh_finish_cmd() too early.  This
   happens if the task completes after it has been aborted but before
   the error handler starts up.  Because scsi_eh_finish_cmd()
   decrements host_failed and adds the task to the done list, the
   error handler start check (host_failed == host_busy) never passes
   and the eh never starts.

2. The multiple task completion paths sas_scsi_clear_queue_... all
   simply delete the task from the error queue.  This causes it to
   disappear into the ether, since a command must be placed on the
   done queue to be finished off by the error handler.  This behaviour
   causes the HBA to hang on pending commands.

Fix 1. by moving the SAS_TASK_STATE_ABORTED check to an exit clause at
the top of the routine and calling ->scsi_done() unconditionally (it
is a nop if the timer has fired).  This keeps the task in the error
handling queue until the eh starts.

Fix 2. by making sure every task goes through task complete followed
by scsi_eh_finish_cmd().

Tested this by firing resets across a disk running a hammer test (now
it actually survives without hanging the system)

Signed-off-by: James Bottomley <[EMAIL PROTECTED]>
---
 drivers/scsi/libsas/sas_scsi_host.c |   65 ++-
 1 files changed, 41 insertions(+), 24 deletions(-)

diff --git a/drivers/scsi/libsas/sas_scsi_host.c 
b/drivers/scsi/libsas/sas_scsi_host.c
index 583d249..b796d99 100644
--- a/drivers/scsi/libsas/sas_scsi_host.c
+++ b/drivers/scsi/libsas/sas_scsi_host.c
@@ -53,10 +53,14 @@ static void sas_scsi_task_done(struct sas_task *task)
 {
struct task_status_struct *ts = &task->task_status;
struct scsi_cmnd *sc = task->uldd_task;
-   struct sas_ha_struct *sas_ha = SHOST_TO_SAS_HA(sc->device->host);
-   unsigned ts_flags = task->task_state_flags;
int hs = 0, stat = 0;
 
+   if (unlikely(task->task_state_flags & SAS_TASK_STATE_ABORTED)) {
+   /* Aborted tasks will be completed by the error handler */
+   SAS_DPRINTK("task done but aborted\n");
+   return;
+   }
+
if (unlikely(!sc)) {
SAS_DPRINTK("task_done called with non existing SCSI cmnd!\n");
list_del_init(&task->list);
@@ -122,11 +126,7 @@ static void sas_scsi_task_done(struct sas_task *task)
sc->result = (hs << 16) | stat;
list_del_init(&task->list);
sas_free_task(task);
-   /* This is very ugly but this is how SCSI Core works. */
-   if (ts_flags & SAS_TASK_STATE_ABORTED)
-   scsi_eh_finish_cmd(sc, &sas_ha->eh_done_q);
-   else
-   sc->scsi_done(sc);
+   sc->scsi_done(sc);
 }
 
 static enum task_attribute sas_scsi_get_task_attr(struct scsi_cmnd *cmd)
@@ -257,13 +257,33 @@ out:
return res;
 }
 
+static void sas_eh_finish_cmd(struct scsi_cmnd *cmd)
+{
+   struct sas_task *task = TO_SAS_TASK(cmd);
+   struct sas_ha_struct *sas_ha = SHOST_TO_SAS_HA(cmd->device->host);
+
+   /* remove the aborted task flag to allow the task to be
+* completed now. At this point, we only get called following
+* an actual abort of the task, so we should be guaranteed not
+* to be racing with any completions from the LLD (hence we
+* don't need the task state lock to clear the flag) */
+   task->task_state_flags &= ~SAS_TASK_STATE_ABORTED;
+   /* Now call task_done.  However, task will be free'd after
+* this */
+   task->task_done(task);
+   /* now finish the command and move it on to the error
+* handler done list, this also takes it off the
+* error handler pending list */
+   scsi_eh_finish_cmd(cmd, &sas_ha->eh_done_q);
+}
+
 static void sas_scsi_clear_queue_lu(struct list_head *error_q, struct 
scsi_cmnd *my_cmd)
 {
struct scsi_cmnd *cmd, *n;
 
list_for_each_entry_safe(cmd, n, error_q, eh_entry) {
if (cmd == my_cmd)
-   list_del_init(&cmd->eh_entry);
+   sas_eh_finish_cmd(cmd);
}
 }
 
@@ -276,7 +296,7 @@ static void sas_scsi_clear_queue_I_T(struct list_head 
*error_q,
struct domain_device *x = cmd_to_domain_dev(cmd);
 
if (x == dev)
-   list_del_init(&cmd->eh_entry);
+   sas_eh_finish_cmd(cmd);
}
 }
 
@@ -290,7 +310,7 @@ static void sas_scsi_clear

RE: [PATCH 1/2] stex: stex_direct_copy shouldn't call dma_map_sg

2008-02-22 Thread Ed Lin


>-Original Message-
>From: fujita [mailto:[EMAIL PROTECTED] On Behalf Of FUJITA Tomonori
>Sent: Friday, February 22, 2008 6:11 AM
>To: linux-scsi@vger.kernel.org
>Cc: FUJITA Tomonori; Ed Lin; James Bottomley
>Subject: [PATCH 1/2] stex: stex_direct_copy shouldn't call dma_map_sg
>
>
>stex_direct_copy copies an in-kernel buffer to a sg list in order to
>spoof some SCSI commands. stex_direct_copy calls dma_map_sg and then
>stex_internal_copy with the value that dma_map_sg returned. It calls
>scsi_kmap_atomic_sg to copy data.
>
>scsi_kmap_atomic_sg doesn't see sg->dma_length so if dma_map_sg merges
>sg entries, stex_internal_copy gets the smaller number of sg entries
>than the acutual number, which means it wrongly think that the data
>length in the sg list is shorter than the actual length.
>
>stex_direct_copy shouldn't call dma_map_sg and it doesn't need since
>this code path doesn't involve dma transfers. This patch removes
>stex_direct_copy and simply calls stex_internal_copy with the actual
>number of sg entries.
>
>Signed-off-by: FUJITA Tomonori <[EMAIL PROTECTED]>
>Cc: Ed Lin <[EMAIL PROTECTED]>
>Cc: James Bottomley <[EMAIL PROTECTED]>
>---
> drivers/scsi/stex.c |   34 --
> 1 files changed, 12 insertions(+), 22 deletions(-)
>
>diff --git a/drivers/scsi/stex.c b/drivers/scsi/stex.c
>index 72f6d80..4b6861c 100644
>--- a/drivers/scsi/stex.c
>+++ b/drivers/scsi/stex.c
>@@ -461,23 +461,6 @@ static void stex_internal_copy(struct 
>scsi_cmnd *cmd,
>   }
> }
> 
>-static int stex_direct_copy(struct scsi_cmnd *cmd,
>-  const void *src, size_t count)
>-{
>-  size_t cp_len = count;
>-  int n_elem = 0;
>-
>-  n_elem = scsi_dma_map(cmd);
>-  if (n_elem < 0)
>-  return 0;
>-
>-  stex_internal_copy(cmd, src, &cp_len, n_elem, ST_TO_CMD);
>-
>-  scsi_dma_unmap(cmd);
>-
>-  return cp_len == count;
>-}
>-
> static void stex_controller_info(struct st_hba *hba, struct 
>st_ccb *ccb)
> {
>   struct st_frame *p;
>@@ -569,8 +552,10 @@ stex_queuecommand(struct scsi_cmnd *cmd, 
>void (* done)(struct scsi_cmnd *))
>   unsigned char page;
>   page = cmd->cmnd[2] & 0x3f;
>   if (page == 0x8 || page == 0x3f) {
>-  stex_direct_copy(cmd, ms10_caching_page,
>-  sizeof(ms10_caching_page));
>+  size_t cp_len = sizeof(ms10_caching_page);
>+  stex_internal_copy(cmd, ms10_caching_page,
>+ &cp_len, scsi_sg_count(cmd),
>+ ST_TO_CMD);
>   cmd->result = DID_OK << 16 | 
>COMMAND_COMPLETE << 8;
>   done(cmd);
>   } else
>@@ -599,8 +584,10 @@ stex_queuecommand(struct scsi_cmnd *cmd, 
>void (* done)(struct scsi_cmnd *))
>   if (id != host->max_id - 1)
>   break;
>   if (lun == 0 && (cmd->cmnd[1] & INQUIRY_EVPD) == 0) {
>-  stex_direct_copy(cmd, console_inq_page,
>-  sizeof(console_inq_page));
>+  size_t cp_len = sizeof(console_inq_page);
>+  stex_internal_copy(cmd, console_inq_page,
>+ &cp_len, scsi_sg_count(cmd),
>+ ST_TO_CMD);
>   cmd->result = DID_OK << 16 | 
>COMMAND_COMPLETE << 8;
>   done(cmd);
>   } else
>@@ -609,6 +596,7 @@ stex_queuecommand(struct scsi_cmnd *cmd, 
>void (* done)(struct scsi_cmnd *))
>   case PASSTHRU_CMD:
>   if (cmd->cmnd[1] == PASSTHRU_GET_DRVVER) {
>   struct st_drvver ver;
>+  size_t cp_len = sizeof(ver);
>   ver.major = ST_VER_MAJOR;
>   ver.minor = ST_VER_MINOR;
>   ver.oem = ST_OEM;
>@@ -616,7 +604,9 @@ stex_queuecommand(struct scsi_cmnd *cmd, 
>void (* done)(struct scsi_cmnd *))
>   ver.signature[0] = PASSTHRU_SIGNATURE;
>   ver.console_id = host->max_id - 1;
>   ver.host_no = hba->host->host_no;
>-  cmd->result = stex_direct_copy(cmd, 
>&ver, sizeof(ver)) ?
>+  stex_internal_copy(cmd, &ver, &cp_len,
>+ scsi_sg_count(cmd), 
>ST_TO_CMD);
>+  cmd->result = sizeof(ver) == cp_len ?
>   DID_OK << 16 | COMMAND_COMPLETE << 8 :
>   DID_ERROR << 16 | COMMAND_COMPLETE << 8;
>   done(cmd);
>-- 
>1.5.3.4
>
>

ACK patches 1-2.
I didn't think dma_map_sg could change the sg count. And clearly
it's unneeded if we can get sg count by scsi_sg_count. Thanks
for the patches.


Ed Lin
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message t

[PATCH] SCSI st: compile fix when DEBUG set to one

2008-02-22 Thread Kai Makisara
Remove the now useless counting of adjacent pages from the debugging code in
to make it compile when DEBUG is set non-zero.

Signed-off-by: Kai Makisara <[EMAIL PROTECTED]>
---
The patch does not touch any code compiled when DEBUG is zero and is suitable
for 2.5.25.

 drivers/scsi/st.c |   11 ---
 drivers/scsi/st.h |1 -
 2 files changed, 4 insertions(+), 8 deletions(-)

Index: linux-2.6.25-rc2-q/drivers/scsi/st.c
===
--- linux-2.6.25-rc2-q.orig/drivers/scsi/st.c
+++ linux-2.6.25-rc2-q/drivers/scsi/st.c
@@ -17,7 +17,7 @@
Last modified: 18-JAN-1998 Richard Gooch <[EMAIL PROTECTED]> Devfs support
  */
 
-static const char *verstr = "20080117";
+static const char *verstr = "20080221";
 
 #include 
 
@@ -1172,7 +1172,7 @@ static int st_open(struct inode *inode, 
STp->try_dio_now = STp->try_dio;
STp->recover_count = 0;
DEB( STp->nbr_waits = STp->nbr_finished = 0;
-STp->nbr_requests = STp->nbr_dio = STp->nbr_pages = 
STp->nbr_combinable = 0; )
+STp->nbr_requests = STp->nbr_dio = STp->nbr_pages = 0; )
 
retval = check_tape(STp, filp);
if (retval < 0)
@@ -1226,8 +1226,8 @@ static int st_flush(struct file *filp, f
}
 
DEBC( if (STp->nbr_requests)
-   printk(KERN_DEBUG "%s: Number of r/w requests %d, dio used in 
%d, pages %d (%d).\n",
-  name, STp->nbr_requests, STp->nbr_dio, STp->nbr_pages, 
STp->nbr_combinable));
+   printk(KERN_DEBUG "%s: Number of r/w requests %d, dio used in 
%d, pages %d.\n",
+  name, STp->nbr_requests, STp->nbr_dio, STp->nbr_pages));
 
if (STps->rw == ST_WRITING && !STp->pos_unknown) {
struct st_cmdstatus *cmdstatp = &STp->buffer->cmdstat;
@@ -1422,9 +1422,6 @@ static int setup_buffering(struct scsi_t
 if (STbp->do_dio) {
STp->nbr_dio++;
STp->nbr_pages += STbp->do_dio;
-   for (i=1; i < STbp->do_dio; i++)
-   if (page_to_pfn(STbp->sg[i].page) == 
page_to_pfn(STbp->sg[i-1].page) + 1)
-   STp->nbr_combinable++;
 }
)
} else
Index: linux-2.6.25-rc2-q/drivers/scsi/st.h
===
--- linux-2.6.25-rc2-q.orig/drivers/scsi/st.h
+++ linux-2.6.25-rc2-q/drivers/scsi/st.h
@@ -164,7 +164,6 @@ struct scsi_tape {
int nbr_requests;
int nbr_dio;
int nbr_pages;
-   int nbr_combinable;
unsigned char last_cmnd[6];
unsigned char last_sense[16];
 #endif
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: New 2.6.24.2 SG_IO SCSI problems

2008-02-22 Thread Mark Hounschell
Mike Christie wrote:
> Mike Christie wrote:
>> Mark Hounschell wrote:
>>> I seem to have run into some sort of regression in the SG_IO
>>> interface of 2.6.24.2. I have an application that up until 2.6.24
>>> worked fine. The 2.6.23.16 kernel works fine.
>>>
>>> During reads I get these kernel messages. Writes and other functions
>>> _seem_ OK. Actually basic
>>> reads  are working. Its with large BC reads using an io_vec list that
>>> the problem shows up.
>>>
>>
>> Are you doing SG_IO to the sg device (/dev/sg*) or to the block device
>> (/dev/sdX)?
> 
> If you are doing SG_IO to the sg device, then I know of one regression
> (well not regression exactly, but I fixed a bug but the patch got
> partially overwritten by another patch and that caused a new bug). Both
> bugs are fixed in 2.6.25-rc2. Could you try that out if you are doing
> SG_IO to the sg device.
> 

Yes, I'm using /dev/sg*. And yes again I'll checkout 2.6.25-rc2 ASIC.

Thanks
Mark
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: New 2.6.24.2 SG_IO SCSI problems

2008-02-22 Thread Mike Christie

Mike Christie wrote:

Mark Hounschell wrote:
I seem to have run into some sort of regression in the SG_IO interface 
of 2.6.24.2. I have an application that up until 2.6.24 worked fine. 
The 2.6.23.16 kernel works fine.


During reads I get these kernel messages. Writes and other functions 
_seem_ OK. Actually basic
reads  are working. Its with large BC reads using an io_vec list that 
the problem shows up.




Are you doing SG_IO to the sg device (/dev/sg*) or to the block device 
(/dev/sdX)?


If you are doing SG_IO to the sg device, then I know of one regression 
(well not regression exactly, but I fixed a bug but the patch got 
partially overwritten by another patch and that caused a new bug). Both 
bugs are fixed in 2.6.25-rc2. Could you try that out if you are doing 
SG_IO to the sg device.

-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: New 2.6.24.2 SG_IO SCSI problems

2008-02-22 Thread Mike Christie

Mark Hounschell wrote:
I seem to have run into some sort of regression in the SG_IO interface of 2.6.24.2. 
I have an application that up until 2.6.24 worked fine. The 2.6.23.16 kernel works fine.


During reads I get these kernel messages. Writes and other functions _seem_ OK. 
Actually basic
reads  are working. Its with large BC reads using an io_vec list that the 
problem shows up.



Are you doing SG_IO to the sg device (/dev/sg*) or to the block device 
(/dev/sdX)?

-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Marvell 6440 SAS/SATA driver

2008-02-22 Thread James Bottomley
On Fri, 2008-02-22 at 11:26 -0500, Jeff Garzik wrote:
> 4) [major] Your email was encoded in base64, which makes it difficult 
> for automated tools to handle, and difficult for some mail clients to 
> view and reply-to.

Yes, I echo this, there's also another problem it causes: The email
didn't actually get through the scsi reflector in part, I suspect,
because of the base64 content (anybody who wasn't on the direct to or cc
list is only seeing Jeff's reply).

James


-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 1/1] scsi: megaraid_sas - Fix random failure of DCDB cmds with sense info

2008-02-22 Thread James Bottomley

On Fri, 2008-02-22 at 09:17 -0700, Yang, Bo wrote:
> James,
> 
> What is the status for this patch?  We need to submit more patches based
> on the acceptance of this patch.

OK, read the thread; Matthew is right.  What you propose would pretty
much destroy compat ioctl handling within the driver.  You need a compat
handler for MEGASAS_IOC_FW.

With your current patch you'd get a failure both from a 64 bit binary
running on x86-64 and if someone ran the x86 binary on ia64.

The driver already uses the compat infrastructure, it shouldn't be too
hard to add this in the correct manner.

James


-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 1/1] scsi: megaraid_sas - Fix random failure of DCDB cmds with sense info

2008-02-22 Thread James Bottomley
On Fri, 2008-02-22 at 09:17 -0700, Yang, Bo wrote:
> James,
> 
> What is the status for this patch?  We need to submit more patches based
> on the acceptance of this patch.

Sorry, I didn't actually get this message because that was in the window
when my email server dropped off the internet.

You'll have to give me time to dig up the threads on marc.info and read
them.

James


-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Marvell 6440 SAS/SATA driver

2008-02-22 Thread Jeff Garzik

Ke Wei wrote:

Added support for Expander. Based on version 0.1 for mvsas.


Signed-off-by: Ke Wei <[EMAIL PROTECTED]>
---
diff --git a/drivers/scsi/mvsas.c b/drivers/scsi/mvsas.c
old mode 100644
new mode 100755
index 03638b9..3c7a154
--- a/drivers/scsi/mvsas.c
+++ b/drivers/scsi/mvsas.c
@@ -2,6 +2,7 @@
mvsas.c - Marvell 88SE6440 SAS/SATA support
 
 	Copyright 2007 Red Hat, Inc.

+   Copyright 2008 Marvell. <[EMAIL PROTECTED]>
 
 	This program is free software; you can redistribute it and/or

modify it under the terms of the GNU General Public License as
@@ -25,6 +26,13 @@
  structures.  this permits elimination of all the le32_to_cpu()
  and cpu_to_le32() conversions.
 
+	Changelog:

+   2008-02-22  0.5 Added support for Expander.
+   2008-02-05  0.4 Added support for hotplug and wide port.
+   2008-01-22  0.3 Added support for SAS HD and SATA Devices.
+   2008-01-09  0.2 detect SAS disk.
+   2007-09-25  0.1 rough draft, Initial version.
+
  */
 
 #include 



Technical content:  looks good, ACK

Patch content:  looks diff'd against correct version, ACK

But we still have one major process problem, and a couple minor problems 
to fix:


1) [minor] please do not include a changelog in the source code.  That's 
what the git repository history is for.


2) [minor] Your patch description (email body) is incorrect.  It should 
describe all changes since version 0.1, the version you diff'd against:


Convert rough draft Marvell 6440 driver to a working driver.

Added support for SAS and SATA devices, hotplug, wide port,
and expanders.

3) [minor] Your email subject should reflect that you are updating 
version 0.1, the version you diff'd against:


[PATCH] mvsas: convert from rough draft to working driver

4) [major] Your email was encoded in base64, which makes it difficult 
for automated tools to handle, and difficult for some mail clients to 
view and reply-to.


It will require some email configuration on your part to disable this, 
and send the email as a text/plain message.


I've copied Saeed Bishara @ Marvell on this email.  Saeed has been 
successfully sending patch for the sata_mv driver (5040, 6080, 6042, 
etc.)  Maybe Saeed can advise you on his email setup?




In any case, once we fix this last problem -- base64 -- we can finally 
apply your patch and get things moving.


You are very close to having a working Linux kernel development setup, 
thanks for your patience!


Jeff




-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/1] scsi: megaraid_sas - Fix random failure of DCDB cmds with sense info

2008-02-22 Thread Matthew Wilcox
On Fri, Feb 22, 2008 at 09:17:33AM -0700, Yang, Bo wrote:
> James,
> 
> What is the status for this patch?  We need to submit more patches based
> on the acceptance of this patch.

I've NAKed it.  You need to use compat_ioctl.  That way your code will
work even if you run a 32-bit x86 application on ia64 (or a 64-bit
application on x86).

-- 
Intel are signing my paycheques ... these opinions are still mine
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours.  We can't possibly take such
a retrograde step."
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 1/1] scsi: megaraid_sas - Fix random failure of DCDB cmds with sense info

2008-02-22 Thread Yang, Bo
James,

What is the status for this patch?  We need to submit more patches based
on the acceptance of this patch.

Thanks.

Bo Yang 

-Original Message-
From: Yang, Bo 
Sent: Thursday, November 29, 2007 1:24 PM
To: 'Matthew Wilcox'
Cc: linux-scsi@vger.kernel.org; [EMAIL PROTECTED];
[EMAIL PROTECTED]; [EMAIL PROTECTED]; Patro, Sumant; Kolli, Neela
Subject: RE: [PATCH 1/1] scsi: megaraid_sas - Fix random failure of DCDB
cmds with sense info


Matthew,

I think my last email confused you. Here are the points for submitting
this patch:

1. It is only to add the ia64 support to our driver.  This is why we add
the ia64 flag.

2. We did not change the implementation for x86 and x86_64.  The
implementation for them with u32 * is there from day one.

3. What I wanted to convey in my previous mail is that if one uses a 64
bit application (s)he may need to test for all functionality again. 

Regards.

Bo Yang

-Original Message-
From: Matthew Wilcox [mailto:[EMAIL PROTECTED]
Sent: Wednesday, November 28, 2007 2:27 PM
To: Yang, Bo
Cc: linux-scsi@vger.kernel.org; [EMAIL PROTECTED];
[EMAIL PROTECTED]; [EMAIL PROTECTED]; Patro, Sumant; Kolli, Neela
Subject: Re: [PATCH 1/1] scsi: megaraid_sas - Fix random failure of DCDB
cmds with sense info

On Wed, Nov 28, 2007 at 12:08:37PM -0700, Yang, Bo wrote:
> Matthew,
> 
> Yes, as I mentioned, our applications are built in 32-bit environment 
> (except for ia64).
> We may see a different behavior if we build it in 64-bit for x86-64. 

Then this patch isn't acceptable.  You need to find a way which works
for both 32-bit and 64-bit applications.

--
Intel are signing my paycheques ... these opinions are still mine "Bill,
look, we understand that you're interested in selling us this operating
system, but compare it to ours.  We can't possibly take such a
retrograde step."
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Actually using the sg table/chain code

2008-02-22 Thread Mike Christie

James Bottomley wrote:

Yes, I can buy the argument for filesystem I/Os.  What about tapes which
currently use the block queue and have internal home grown stuff to
handle larger transfers ... how are they supposed to set the larger
default sector size?  Just modify the bare q->max_sectors?



Sorry for the late response. I have been doing userspace stuff and not 
keeping up with linux-scsi :(


For scsi tape and passthrough (any place we use REQ_TYPE_BLOCK_PC like 
with st or sg or block/scsi_ioctl or bsg), the block/bio/scatterlist 
building code ignores q->max_sectors and uses q->max_hw_sectors.

-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH] ips: sg chaining support to the path to non I/O commands

2008-02-22 Thread FUJITA Tomonori
On Tue, 19 Feb 2008 04:39:00 -0800
"Salyzyn, Mark" <[EMAIL PROTECTED]> wrote:

> ACK

Thanks!


> Other RAID drivers (eg: aacraid) makes the assumption that commands
> in these paths (INQUIRY, READ CAPACITY, MODE SENSE etc spoofing) are
> single scatter gather elements and have yet to be bitten. I agree
> with Fujita-san about the practical unlikelihood. The fix does not
> incur any change in code overhead, so it is worth hardening!
>
> Can one create a request via /dev/sg for a buffer smaller than 256
> and manage to create a multi-element scatter gather?

I think that we can do post 2.6.24 since we relaxed the default
alignment requirements (from 511 to 3). So a buffer more than 8 bytes
can leads to multi-element scatter gathers though it rarely happens.

Shortly I will submit the helper functions to copy data between sg
list and a buffer. It can replace aac_internal_transfer but it's not
for scsi-rc-fixes. If you worry that aac_internal_transfer can't
handle multiple sg entries, you can do something like this, I think:


diff --git a/drivers/scsi/aacraid/linit.c b/drivers/scsi/aacraid/linit.c
index ae5f74f..73fa7c2 100644
--- a/drivers/scsi/aacraid/linit.c
+++ b/drivers/scsi/aacraid/linit.c
@@ -455,6 +455,12 @@ static int aac_slave_configure(struct scsi_device *sdev)
return 0;
 }
 
+static int aac_slave_alloc(struct scsi_device *sdev)
+{
+   blk_queue_update_dma_alignment(sdev->request_queue, (512 - 1));
+   return 0;
+}
+
 /**
  * aac_change_queue_depth  -   alter queue depths
  * @sdev:  SCSI device we are considering
@@ -1015,6 +1021,7 @@ static struct scsi_host_template aac_driver_template = {
.queuecommand   = aac_queuecommand,
.bios_param = aac_biosparm,
.shost_attrs= aac_attrs,
+   .slave_alloc= aac_slave_alloc,
.slave_configure= aac_slave_configure,
.change_queue_depth = aac_change_queue_depth,
.sdev_attrs = aac_dev_attrs,



=
Here's a malicious version of sg_inq, which tries to create multiple
sg entries:

=
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 

#define INQ_REPLY_LEN 96
#define INQ_CMD_LEN 6

int main(int argc, char **argv)
{
struct sg_io_hdr hdr;
unsigned char scb[INQ_CMD_LEN] = {0x12, 0, 0, 0, INQ_REPLY_LEN, 0};
unsigned char sense[32];
void *buf;
int offset = 4096 - 4;
int fd, ret;

buf = valloc(8192);

memset(&hdr, 0, sizeof(hdr));

hdr.interface_id = 'S';
hdr.cmd_len = sizeof(scb);
hdr.mx_sb_len = sizeof(sense);
hdr.dxfer_direction = SG_DXFER_FROM_DEV;
hdr.dxfer_len = INQ_REPLY_LEN;
hdr.dxferp = buf + offset;
hdr.cmdp = scb;
hdr.sbp = sense;
hdr.flags |= SG_FLAG_DIRECT_IO;

fd = open(argv[1], O_RDONLY);
if (fd < 0) {
fprintf(stderr, "fail to open %s\n", argv[1]);
return fd;
}

ret = ioctl(fd, SG_IO, &hdr);
if (ret < 0) {
fprintf(stderr, "fail to ioctl %d\n", ret);
return ret;
}

return ret;
}
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2] stex: stex_direct_copy shouldn't call dma_map_sg

2008-02-22 Thread FUJITA Tomonori
stex_direct_copy copies an in-kernel buffer to a sg list in order to
spoof some SCSI commands. stex_direct_copy calls dma_map_sg and then
stex_internal_copy with the value that dma_map_sg returned. It calls
scsi_kmap_atomic_sg to copy data.

scsi_kmap_atomic_sg doesn't see sg->dma_length so if dma_map_sg merges
sg entries, stex_internal_copy gets the smaller number of sg entries
than the acutual number, which means it wrongly think that the data
length in the sg list is shorter than the actual length.

stex_direct_copy shouldn't call dma_map_sg and it doesn't need since
this code path doesn't involve dma transfers. This patch removes
stex_direct_copy and simply calls stex_internal_copy with the actual
number of sg entries.

Signed-off-by: FUJITA Tomonori <[EMAIL PROTECTED]>
Cc: Ed Lin <[EMAIL PROTECTED]>
Cc: James Bottomley <[EMAIL PROTECTED]>
---
 drivers/scsi/stex.c |   34 --
 1 files changed, 12 insertions(+), 22 deletions(-)

diff --git a/drivers/scsi/stex.c b/drivers/scsi/stex.c
index 72f6d80..4b6861c 100644
--- a/drivers/scsi/stex.c
+++ b/drivers/scsi/stex.c
@@ -461,23 +461,6 @@ static void stex_internal_copy(struct scsi_cmnd *cmd,
}
 }
 
-static int stex_direct_copy(struct scsi_cmnd *cmd,
-   const void *src, size_t count)
-{
-   size_t cp_len = count;
-   int n_elem = 0;
-
-   n_elem = scsi_dma_map(cmd);
-   if (n_elem < 0)
-   return 0;
-
-   stex_internal_copy(cmd, src, &cp_len, n_elem, ST_TO_CMD);
-
-   scsi_dma_unmap(cmd);
-
-   return cp_len == count;
-}
-
 static void stex_controller_info(struct st_hba *hba, struct st_ccb *ccb)
 {
struct st_frame *p;
@@ -569,8 +552,10 @@ stex_queuecommand(struct scsi_cmnd *cmd, void (* 
done)(struct scsi_cmnd *))
unsigned char page;
page = cmd->cmnd[2] & 0x3f;
if (page == 0x8 || page == 0x3f) {
-   stex_direct_copy(cmd, ms10_caching_page,
-   sizeof(ms10_caching_page));
+   size_t cp_len = sizeof(ms10_caching_page);
+   stex_internal_copy(cmd, ms10_caching_page,
+  &cp_len, scsi_sg_count(cmd),
+  ST_TO_CMD);
cmd->result = DID_OK << 16 | COMMAND_COMPLETE << 8;
done(cmd);
} else
@@ -599,8 +584,10 @@ stex_queuecommand(struct scsi_cmnd *cmd, void (* 
done)(struct scsi_cmnd *))
if (id != host->max_id - 1)
break;
if (lun == 0 && (cmd->cmnd[1] & INQUIRY_EVPD) == 0) {
-   stex_direct_copy(cmd, console_inq_page,
-   sizeof(console_inq_page));
+   size_t cp_len = sizeof(console_inq_page);
+   stex_internal_copy(cmd, console_inq_page,
+  &cp_len, scsi_sg_count(cmd),
+  ST_TO_CMD);
cmd->result = DID_OK << 16 | COMMAND_COMPLETE << 8;
done(cmd);
} else
@@ -609,6 +596,7 @@ stex_queuecommand(struct scsi_cmnd *cmd, void (* 
done)(struct scsi_cmnd *))
case PASSTHRU_CMD:
if (cmd->cmnd[1] == PASSTHRU_GET_DRVVER) {
struct st_drvver ver;
+   size_t cp_len = sizeof(ver);
ver.major = ST_VER_MAJOR;
ver.minor = ST_VER_MINOR;
ver.oem = ST_OEM;
@@ -616,7 +604,9 @@ stex_queuecommand(struct scsi_cmnd *cmd, void (* 
done)(struct scsi_cmnd *))
ver.signature[0] = PASSTHRU_SIGNATURE;
ver.console_id = host->max_id - 1;
ver.host_no = hba->host->host_no;
-   cmd->result = stex_direct_copy(cmd, &ver, sizeof(ver)) ?
+   stex_internal_copy(cmd, &ver, &cp_len,
+  scsi_sg_count(cmd), ST_TO_CMD);
+   cmd->result = sizeof(ver) == cp_len ?
DID_OK << 16 | COMMAND_COMPLETE << 8 :
DID_ERROR << 16 | COMMAND_COMPLETE << 8;
done(cmd);
-- 
1.5.3.4

-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] stex: stex_internal_copy should be called with sg_count in struct st_ccb

2008-02-22 Thread FUJITA Tomonori
stex_internal_copy copies an in-kernel buffer to a sg list by using
scsi_kmap_atomic_sg. Some functions calls stex_internal_copy with
sg_count in struct st_ccb, which is the value that dma_map_sg
returned. However it might be shorter than the actual number of sg
entries (if the IOMMU merged the sg entries).

scsi_kmap_atomic_sg doesn't see sg->dma_length so stex_internal_copy
should be called with the actual number of sg entries
(i.e. scsi_sg_count), because if the sg entries were merged,
stex_direct_copy wrongly think that the data length in the sg list is
shorter than the actual length.

Signed-off-by: FUJITA Tomonori <[EMAIL PROTECTED]>
Cc: Ed Lin <[EMAIL PROTECTED]>
Cc: James Bottomley <[EMAIL PROTECTED]>
---
 drivers/scsi/stex.c |   10 ++
 1 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/drivers/scsi/stex.c b/drivers/scsi/stex.c
index 4b6861c..654430e 100644
--- a/drivers/scsi/stex.c
+++ b/drivers/scsi/stex.c
@@ -467,7 +467,8 @@ static void stex_controller_info(struct st_hba *hba, struct 
st_ccb *ccb)
size_t count = sizeof(struct st_frame);
 
p = hba->copy_buffer;
-   stex_internal_copy(ccb->cmd, p, &count, ccb->sg_count, ST_FROM_CMD);
+   stex_internal_copy(ccb->cmd, p, &count, scsi_sg_count(ccb->cmd),
+  ST_FROM_CMD);
memset(p->base, 0, sizeof(u32)*6);
*(unsigned long *)(p->base) = pci_resource_start(hba->pdev, 0);
p->rom_addr = 0;
@@ -485,7 +486,8 @@ static void stex_controller_info(struct st_hba *hba, struct 
st_ccb *ccb)
p->subid =
hba->pdev->subsystem_vendor << 16 | hba->pdev->subsystem_device;
 
-   stex_internal_copy(ccb->cmd, p, &count, ccb->sg_count, ST_TO_CMD);
+   stex_internal_copy(ccb->cmd, p, &count, scsi_sg_count(ccb->cmd),
+  ST_TO_CMD);
 }
 
 static void
@@ -699,7 +701,7 @@ static void stex_copy_data(struct st_ccb *ccb,
if (ccb->cmd == NULL)
return;
stex_internal_copy(ccb->cmd,
-   resp->variable, &count, ccb->sg_count, ST_TO_CMD);
+   resp->variable, &count, scsi_sg_count(ccb->cmd), ST_TO_CMD);
 }
 
 static void stex_ys_commands(struct st_hba *hba,
@@ -724,7 +726,7 @@ static void stex_ys_commands(struct st_hba *hba,
 
count = STEX_EXTRA_SIZE;
stex_internal_copy(ccb->cmd, hba->copy_buffer,
-   &count, ccb->sg_count, ST_FROM_CMD);
+   &count, scsi_sg_count(ccb->cmd), ST_FROM_CMD);
inq_data = (ST_INQ *)hba->copy_buffer;
if (inq_data->DeviceTypeQualifier != 0)
ccb->srb_status = SRB_STATUS_SELECTION_TIMEOUT;
-- 
1.5.3.4

-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] libsas: fix error handling

2008-02-22 Thread Luben Tuikov
--- On Thu, 2/21/08, James Bottomley <[EMAIL PROTECTED]> wrote:
> Look, Luben; I don't mind you making an arse of
> yourself on the mailing
> lists; that's about par for the course nowadays.

No, you just did in this thread by pretending to be any kind of
authority or have an expertise on how to fix this problem, while in
fact to properly address it, one really needs knowledge of the
protocol, how the HW works and how the sequencer works, and
a protocol trace of this specific problem.  And then fixing the
problem after it's been fully understood is yet another feat.

It's unfortunate that you feel the need to attack me and call
me names.  You need to self-examine yourself and figure out
and fix your hostile attitude.

I've mostly stopped exposing your complete and utter incompetence
in SCSI infrastructure and left the weed grow, but calling me names
on public lists is very unprofessional.

Speaking of "arse", a recent example of one of your "pearls" is this:
http://marc.info/?l=linux-ide&m=120291905515015&w=2
"However, I'm happy to be proven wrong ... anyone on this thread is
 welcome to come up with a userland enclosure infrastructure.  Once it
 does everything the in-kernel one does (which is really about the
 minimal possible set), I'll be glad to erase the in-kernel one."

You introduce unnecessary code, the vendors tell you they don't
agree with you, you respond with "I'm happy to be proven wrong" in spite
of everyone's opinion differing yours.  You clearly decide to do what
you think is right without listening to vendors or people in the
industry.  Also, you completely ignore sg_ses(8).

There are other examples, this one is just more recent.

> However, failing to report serious bugs you knew about in
> your code, and
> have known about for two years, is tantamount to sabotage
> ... especially

In 2005, you decided to take a working code out of a GIT tree,
change it privately, and then submit it to GIT as new. Since of
course GIT history is immutable, there is no way to sync back to the
solution I had for users to inspect, evaluate and compare to your
derived code, since you took it out of GIT and changed it privately
before submitting it back to the kernel (I had a GIT repo with
already integrated working code, synced daily to Linus' kernel).  Then
a lot of people started asking me: "It doesn't work" and "SATA
doesn't work", etc, and I had to explain over and over again
what's happened.  So please do not talk about sabotage.

You changed the code quite a bit, to the point where patches
between what I maintain and what you decided to fork off
and make it your own play-ground were different and patches wouldn't
apply cleanly or at all.

For this thread, I've looked at the "error handling" code of what's
in the kernel for aic94xx and your version of the "libsas" and
frankly, I cannot submit patches to something which completely does
NOT follow the Sequencer Interface Spec for AIC94xx series of chips.
For example, handling of REQ_TASK_ABORT -- its implementation is
quite naive and out of spec, i.e. already broken.  I cannot submit
patches to something that I know is already broken, which had
been changed from a per-spec code version.

You know, people are still asking me to sync up to kernel
x.y.z the SAS/SATL+aic94xx I maintain, because they tell me "have been
running it for 2 years on production hardware [i.e. IBM] and no
problems and I just need to upgrade the kernel".

Will you also attack me for having a SATL as part of my SAS code
which I didn't submit to the kernel?  Do you know how much code
is out there that vendors don't bother to try to integrate because
of your attitude such as this?

> when you've been reading the bug reports on the mailing
> list.  I have to
> wonder how many more such bugs you've left in the code
> for users to
> find.

I didn't "leave" any bugs.  Bugs are just part of the development
cycle.  You should know this.

aic94xx is an old product, most customers have upgraded.
The ones who still have IBM servers with this chip, have had the
need to upgrade their kernels for various other reasons and since the
in-kernel solution has failed them, have requested to run my code.

> > BTW, the problem you're "debugging" may
> require a protocol
> > trace and a sequencer update by Adaptec.
> 
> I think you're fully aware that my test rig consists of
> a couple of SAS
> cards and drives donated by IBM and two expanders donated
> by LSI, plus a bit of equipment scavenged from Intel.

No, I'm not aware of this.  I have absolutely no interest in what your
"test rig" consists of.

Here is a question though: why do you even have SAS HW?  You don't
work for a SAS HW vendor and you don't have expertise in SAS
or SCSI in general.  Do you have hardware for each and every
LLDD that's in the kernel?

> I have no access to sophisticated tools like protocol analyzers.

See above.  You don't need to have access to such things unless
you worked for a vendor, and your job is to make your employer's

Re: New 2.6.24.2 SG_IO SCSI problems

2008-02-22 Thread Mark Hounschell
Mark Hounschell wrote:
> James Bottomley wrote:
>> On Thu, 2008-02-21 at 10:15 -0500, Mark Hounschell wrote:
>>> I seem to have run into some sort of regression in the SG_IO interface of 
>>> 2.6.24.2. 
>>> I have an application that up until 2.6.24 worked fine. The 2.6.23.16 
>>> kernel works fine.
>>>
>>> During reads I get these kernel messages. Writes and other functions _seem_ 
>>> OK. Actually basic
>>> reads  are working. Its with large BC reads using an io_vec list that the 
>>> problem shows up.
>>>
>>> Feb 21 09:27:51 harley kernel: (scsi1:A:2:0): data overrun detected in 
>>> Data-in phase.  Tag == 0x1.
>>> Feb 21 09:27:51 harley kernel: (scsi1:A:2:0): Have seen Data Phase.  Length 
>>> = 256.  NumSGs = 1.
>>> Feb 21 09:27:51 harley kernel: sg[0] - Addr 0x06256100 : Length 256
>> Help me a little here.  What was the io_vec and command you sent in to
>> produce this?  The aic debugging information implies a single element sg
>> list for a 256 byte read.
>>
>> James
>>
>>
>>
> 
> Well, I did no 256 byte xfer at all.  My failing io_vec list has 6 elements.
> The first 5 are for byte counts of 0xfffc and the last 0x6114. See below.
> 
> This is some debug info from within my app of the commands leading up the 
> failure:
> If you need actual values of the io_vec lists I will need to add some 
> additional debug
> info into the app. I will do if needed.
> 
> The disk BTW is formated at 768 byte sector size.
>  
> The first read has a 2 element io_vec list and reports no error:
> ScsiDev_thread_7e00: Read CBD = 0x08 0x00 0x00 0x00 0x01 0x00
> ScsiDev_thread_7e00: Read1(1) bc = 0x0078 addr = 0x00 Skip 0
> ScsiDev_thread_7e00: Read(2) bc = 0x0288 addr = 0xb6cea368 Skip 1
> ScsiDev_thread_7e00: SRead  DC ops = 2 short_bc = 288
> 
> The second read has a 4 element io_vec list and reports no error:
> ScsiDev_thread_7e00: Read CBD = 0x08 0x00 0x00 0x00 0x05 0x00
> ScsiDev_thread_7e00: Read1(1) bc = 0x0780 addr = 0x00 Skip 0
> ScsiDev_thread_7e00: Read1(2) bc = 0x0670 addr = 0x00 Skip 1
> ScsiDev_thread_7e00: Read1(3) bc = 0x003c addr = 0x000200 Skip 0
> ScsiDev_thread_7e00: SRead(4) bc = 0x022c addr = 0xb6cea368 Skip 1
> ScsiDev_thread_7e00: Read  DC ops = 4 short_bc = 22c
> 
> There is a seek here that reports no error:
> ScsiDev_thread_7e00: Seek address 2752 BPS 768
> 
> 
> This read has a 6 element io_vec list and reports  the error to the app:
> ScsiDev_thread_7e00: ReadX CBD = 0x28 0x00 0x00 0x00 0x27 0x52 0x00 0x01 0xcb 
> 0x00
> ScsiDev_thread_7e00: Read1(1) bc = 0xfffc addr = 0x000784 Skip 0
> ScsiDev_thread_7e00: Read1(2) bc = 0xfffc addr = 0x010780 Skip 0
> ScsiDev_thread_7e00: Read1(3) bc = 0xfffc addr = 0x02077c Skip 0
> ScsiDev_thread_7e00: Read1(4) bc = 0xfffc addr = 0x030778 Skip 0
> ScsiDev_thread_7e00: Read1(5) bc = 0xfffc addr = 0x040774 Skip 0
> ScsiDev_thread_7e00: Read1(6) bc = 0x6114 addr = 0x050770 Skip 0
> ScsiDev_thread_7e00: Read  DC ops = 6 short_bc = 0
> 
> ScsiDev_thread_7e00: scsi = 0x0 msg 0x0 host = 0x0007 driver = 0x
> ScsiDev_thread_7e00: Read error: sns = 0x00 residual = 0x
> ScsiDev_thread_7e00: Posting IPL status 0x0090 0x000e for Suba  
> to loc 0x00
> ScsiDev_thread_7e00: Sleeping!!
> 
> 
> Below is the complete dump of kernel messages for the above. I assume they are
> a result of the last failing read but there is an awfull lot there just for 
> that 6 element
> io_vec list. Sorry to put all this in there but wanted to you to get the idea.
> 
> 
> Feb 21 10:51:03 harley kernel: (scsi1:A:2:0): data overrun detected in 
> Data-in phase.  Tag == 0x1.
> Feb 21 10:51:03 harley kernel: (scsi1:A:2:0): Have seen Data Phase.  Length = 
> 256.  NumSGs = 1.
> Feb 21 10:51:03 harley kernel: sg[0] - Addr 0x03252e100 : Length 256
.
. Snip
.
> Feb 21 10:51:49 harley kernel: (scsi1:A:2:0): data overrun detected in 
> Data-in phase.  Tag == 0x1.
> Feb 21 10:51:49 harley kernel: (scsi1:A:2:0): Have seen Data Phase.  Length = 
> 256.  NumSGs = 1.
> Feb 21 10:51:49 harley kernel: sg[0] - Addr 0x0f156100 : Length 256
> 
> Again, sorry for all that.
> 

FWIW, on a different machine doing the same thing I get only a single
kernel message and NOT the tons shown above.

Regards
Mark

-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: LSI Logic MegaRAID SATA 150-4 / LSI Logic New Generation RAID Device Drivers (MEGARAID_NEWGEN) problems (megaraid abort: scsi cmd:14600, do now own)

2008-02-22 Thread Andrew Morton
(cc's added)

On Mon, 18 Feb 2008 21:09:22 -0500 "David M. Strang" <[EMAIL PROTECTED]> wrote:

> Greetings -
> 
> A couple months back I purchased a LSI Logic MegaRAID ATA 150-4 
> controller, as well as 3 Seagate 500GB SATA-II hard drives to use in my 
> system. Previously, I was using a pair of WD4000YR's in software raid, 
> which seemed to work well. I've just not gotten around to working on 
> migrating my data to these new drivers + controller, and it's giving me 
> some issues. As with most, I'm having some severe performance issues, 
> the performance is simply abysmal. Before getting into the details, here 
> is a quick overview of my configuration:
> 
> System:
> Tyan Tiger i7320/R (S5350) System Board
> 2x Intel Xeon 3.0 GHz
> 4GB RAM
> 
> LSI Logic MegaRAID ATA 150-4 controller -  Firmware Revision: 713S
> 3x Seagate 7200.10 (Perpendicular Recording) ST3500630AS 500GB SATA-II 
> drives configured as a RAID-1 array with a HotSpare.
> 
> Also, connected to the onboard controller is a WD4000YR, where all of my 
> data currently resides.
> 
> I'm running Gentoo Hardended AMD64 MultiLib 
> (/usr/portage/profiles/hardened/amd64/multilib)
> 
> My current kernel revision is 2.6.23-hardened-r7.
> 
> Here are some (possibly) relevant snippets from dmesg during startup:
> 
> ...
> megaraid cmm: 2.20.2.7 (Release Date: Sun Jul 16 00:01:03 EST 2006)
> megaraid: 2.20.5.1 (Release Date: Thu Nov 16 15:32:35 EST 2006)
> megaraid: probe new device 0x1000:0x1960:0x1000:0x4523: bus 3:slot 3:func 0
> ACPI: PCI Interrupt :03:03.0[A] -> GSI 24 (level, low) -> IRQ 24
> megaraid: fw version:[713S] bios version:[G121]
> scsi0 : LSI Logic MegaRAID driver
> scsi[0]: scanning scsi channel 0 [Phy 0] for non-raid devices
> scsi[0]: scanning scsi channel 1 [virtual] for logical drives
> scsi 0:1:0:0: Direct-Access MegaRAID LD 0 RAID1  476G 713S PQ: 0 ANSI: 2
> sd 0:1:0:0: [sda] 976762880 512-byte hardware sectors (500103 MB)
> sd 0:1:0:0: [sda] Write Protect is off
> sd 0:1:0:0: [sda] Mode Sense: 00 00 00 00
> sd 0:1:0:0: [sda] Asking for cache data failed
> sd 0:1:0:0: [sda] Assuming drive cache: write through
> sd 0:1:0:0: [sda] 976762880 512-byte hardware sectors (500103 MB)
> sd 0:1:0:0: [sda] Write Protect is off
> sd 0:1:0:0: [sda] Mode Sense: 00 00 00 00
> sd 0:1:0:0: [sda] Asking for cache data failed
> sd 0:1:0:0: [sda] Assuming drive cache: write through
>  sda: sda1 sda2 sda3 sda4
> sd 0:1:0:0: [sda] Attached SCSI disk
> ata_piix :00:1f.2: version 2.12
> ata_piix :00:1f.2: MAP [ P0 -- P1 -- ]
> ACPI: PCI Interrupt :00:1f.2[A] -> GSI 18 (level, low) -> IRQ 18
> PCI: Setting latency timer of device :00:1f.2 to 64
> scsi1 : ata_piix
> scsi2 : ata_piix
> ata1: SATA max UDMA/133 cmd 0x000114a0 ctl 0x0001149a 
> bmdma 0x00011470 irq 18
> ata2: SATA max UDMA/133 cmd 0x00011490 ctl 0x00011486 
> bmdma 0x00011478 irq 18
> ata1.00: ATA-7: WDC WD4000YR-01PLB0, 01.06A01, max UDMA/133
> ata1.00: 781422768 sectors, multi 16: LBA48 NCQ (depth 0/32)
> ata1.00: configured for UDMA/133
> scsi 1:0:0:0: Direct-Access ATA  WDC WD4000YR-01P 01.0 PQ: 0 ANSI: 5
> sd 1:0:0:0: [sdb] 781422768 512-byte hardware sectors (400088 MB)
> sd 1:0:0:0: [sdb] Write Protect is off
> sd 1:0:0:0: [sdb] Mode Sense: 00 3a 00 00
> sd 1:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't 
> support DPO or FUA
> sd 1:0:0:0: [sdb] 781422768 512-byte hardware sectors (400088 MB)
> sd 1:0:0:0: [sdb] Write Protect is off
> sd 1:0:0:0: [sdb] Mode Sense: 00 3a 00 00
> sd 1:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't 
> support DPO or FUA
>  sdb: sdb1 sdb2 sdb3 sdb4
> sd 1:0:0:0: [sdb] Attached SCSI disk
> ...
> 
> My controller is configured for Write Back Caching, Adaptive Read Ahead, 
> and Direct I/O (I've also tried cached I/O but it scared me...)
> 
> The first thing I'm noticing is the horrible performance on the raid 
> disk, compared to the single standalone hard disk. Here is the output 
> from hdparm -tT on the single disk:
> 
> -([EMAIL PROTECTED])-(~)- # hdparm -tT /dev/sdb1
> 
> /dev/sdb1:
>  Timing cached reads:   1670 MB in  2.00 seconds = 835.00 MB/sec
>  Timing buffered disk reads:  140 MB in  3.01 seconds =  46.45 MB/sec
> 
> And then, the output from the raid-1 array:
> 
> -([EMAIL PROTECTED])-(~)- # hdparm -tT /dev/sda1
> 
> /dev/sda1:
>  Timing cached reads:   1718 MB in  2.00 seconds = 859.65 MB/sec
>  Timing buffered disk reads:   92 MB in  3.09 seconds =  29.76 MB/sec
> 
> I'm not sure what the deal is with the buffered disk reads being so much 
> WORSE than a single disk. So poor performance is a concern, but what's 
> more alarming are the messages showing up in DMESG. When I first tried 
> Cached IO - performance seemed good... except, dmesg was littered with 
> these errors (?):
> 
> megaraid: aborting-14610 cmd=2a 
> megaraid abort: scsi cmd:14610, do now own
> megaraid: aborting-14612 cmd=2a 
> megaraid abort: scsi cmd:14612